DragonFly On-Line Manual Pages

UNICODE_CONVERT(3)          Courier Unicode Library         UNICODE_CONVERT(3)

NAME
       unicode_u_ucs4_native, unicode_u_ucs2_native, unicode_convert_init,
       unicode_convert, unicode_convert_deinit, unicode_convert_tocbuf_init,
       unicode_convert_tou_init, unicode_convert_fromu_init,
       unicode_convert_uc, unicode_convert_tocbuf_toutf8_init,
       unicode_convert_tocbuf_fromutf8_init, unicode_convert_toutf8,
       unicode_convert_fromutf8, unicode_convert_tobuf,
       unicode_convert_tou_tobuf, unicode_convert_fromu_tobuf - unicode
       character set conversion

SYNOPSIS
       #include <courier-unicode.h>

                extern const char unicode_u_ucs4_native[];

                extern const char unicode_u_ucs2_native[];

       unicode_convert_handle_t unicode_convert_init(const char *src_chset,
                                                     const char *dst_chset,
                                                     void *cb_arg);

       int unicode_convert(unicode_convert_handle_t handle, const char *text,
                           size_t cnt);

       int unicode_convert_deinit(unicode_convert_handle_t handle,
                                  int *errptr);

       unicode_convert_handle_t
                                                            unicode_convert_tocbuf_init(const
                                                            char *src_chset,
                                                            const char *dst_chset,
                                                            char **cbufptr_ret,
                                                            size_t *cbufsize_ret,
                                                            int nullterminate);

       unicode_convert_handle_t
                                                                   unicode_convert_tocbuf_toutf8_init(const
                                                                   char
                                                                   *src_chset,
                                                                   char **cbufptr_ret,
                                                                   size_t *cbufsize_ret,
                                                                   int nullterminate);

       unicode_convert_handle_t
                                                                     unicode_convert_tocbuf_fromutf8_init(const
                                                                     char *dst_chset,
                                                                     char **cbufptr_ret,
                                                                     size_t *cbufsize_ret,
                                                                     int nullterminate);

       unicode_convert_handle_t
                                                         unicode_convert_tou_init(const
                                                         char *src_chset,
                                                         unicode_char **ucptr_ret,
                                                         size_t *ucsize_ret,
                                                         int nullterminate);

       unicode_convert_handle_t
                                                           unicode_convert_fromu_init(const
                                                           char *dst_chset,
                                                           char **cbufptr_ret,
                                                           size_t *cbufsize_ret,
                                                           int nullterminate);

       int unicode_convert_uc(unicode_convert_handle_t handle,
                              const unicode_char *text, size_t cnt);

       char *unicode_convert_toutf8(const char *text, const char *charset,
                                    int *error);

       char *unicode_convert_fromutf8(const char *text, const char *charset,
                                      int *error);

       char *unicode_convert_tobuf(const char *text, const char *charset,
                                   const char *dstcharset, int *error);

       int unicode_convert_toubuf(const char *text, size_t text_l,
                                  const char *charset, unicode_char **uc,
                                  size_t *ucsize, int *error);

       int unicode_convert_fromu_tobuf(const unicode_char *utext,
                                       size_t utext_l, const char *charset,
                                       char **c, size_t *csize, int *error);

DESCRIPTION
       unicode_u_ucs4_native[] contains the string "UCS-4BE" or "UCS-4LE",
       matching the native unicode_char endianness.

       unicode_u_ucs2_native[] contains the string "UCS-2BE" or "UCS-2LE",
       matching the native unicode_char endianness.

       unicode_convert_init(), unicode_convert(), and unicode_convert_deinit()
       are an adaption of th iconv(3)[1] API that uses the same calling
       convention as the other algorithms in this unicode library, with some
       value-added features. These functions use iconv(3) to effect the actual
       character set conversion.

       unicode_convert_init() returns a non-NULL handle for the requested
       conversion, or NULL if the requested conversion is not available.
       unicode_convert_init() takes a pointer to the output function that
       receives receives converted character text. The output function
       receives a pointer to the converted character text, and the number of
       characters in the converted text. The output function gets repeatedly
       called, until it receives the entire converted text.

       The character text to convert gets passed, repeatedly, to
       unicode_convert(). Each call to unicode_convert() results in the output
       function getting invoked, zero or more times, with each successive part
       of the converted text. Finally, unicode_convert_deinit() stops the
       conversion and deallocates the conversion handle.

       It's possible that a call to unicode_convert_deinit() results in some
       additional calls to the output function, passing the remaining, final
       parts, of the converted text, before unicode_convert_deinit()
       deallocates the handle, and returns.

       The output function should return 0 normally. A non-0 return indicates
       n error condition.  unicode_convert_deinit() returns non-zero if any
       previous invocation of the output function returned non-zero (this
       includes any invocations of the output function resulting from this
       call, or prior unicode_convert() calls), or 0 if all invocations of the
       output function returned 0.

       If the errptr is not NULL, *errptr gets set to non-zero if there were
       any conversion errors -- if there was any text that could not be
       converted to the destination character text.

       unicode_convert() also returns non-zero if it calls the output function
       and it returns non-zero, however the conversion handle remains
       allocated, so unicode_convert_deinit() must still be called, to clean
       that up.

   Collecting converted text into a buffer
       Call unicode_convert_tocbuf_init() instead of unicode_convert_init(),
       then call unicode_convert() and unicode_convert_deinit() normally. The
       parameters to unicode_convert_init() specify the source and the
       destination character sets.  unicode_convert_tocbuf_toutf8_init() is
       just an alias that specifies UTF-8 as the destination character set.
       unicode_convert_tocbuf_fromutf8_init() is just an alias that specifies
       UTF-8 as the source character st.

       These functions supply an output function that collects the converted
       text into a malloc()ed buffer. If unicode_convert_deinit() returns 0,
       *cbufptr_ret gets initialized to a malloc()ed buffer, and the number of
       converted characters, the size of the malloc()ed buffer, get placed
       into *cbufsize_ret.

           Note

           If the converted string is an empty string, *cbufsize_ret gets set
           to 0, but *cbufptr_ret still gets initialized (to a dummy malloced
           buffer).

       A non-zero nullterminate places a trailing \0 character after the
       converted string (this is included in *cbufsize_ret).

   Converting between character sets and unicode
       unicode_convert_tou_init() converts character text into a unicode_char
       buffer. It works just like unicode_convert_tocbuf_init(), except that
       only the source character set gets specified and the output buffer is a
       unicode_char buffer.  nullterminate terminates the converted unicode
       characters with a U+0000.

       unicode_convert_fromu_init() converts unicode_chars to the output
       character set, and also works like unicode_convert_tocbuf_init().
       Additionally, in this case, unicode_convert_uc() works just like
       unicode_convert() except that the input sequence is a unicode_char
       sequence, and the count parameter is th enumber of unicode characters.

   One-shot conversions
       unicode_convert_toutf8() converts the specified text in the specified
       text into a UTF-8 string, returning a malloced buffer. If error is not
       NULL, even if unicode_convert_toutf8() returns a non NULL value *error
       gets set to a non-zero value if a character conversion error has
       occured, and some characters could not be converted.

       unicode_convert_fromutf8() does a similar conversion from UTF-8 text to
       the specified character set.

       unicode_convert_tobuf() does a similar conversion between two different
       character sets.

       unicode_convert_tou_tobuf() calls unicode_convert_tou_init(), feeds the
       character string through unicode_convert(), then calls
       unicode_convert_deinit(). If this function returns 0, *uc and *ucsize
       are set to a malloced buffer+size holding the unicode char array.

       unicode_convert_fromu_tobuf() calls unicode_convert_fromu_init(), feeds
       the unicode array through unicode_convert_uc(), then calls
       unicode_convert_deinit(). If this function returns 0, *c and *csize are
       set to a malloced buffer+size holding the char array.

SEE ALSO
       courier-unicode(7), unicode_convert_tocase(3),
       unicode_default_chset(3).

AUTHOR
       Sam Varshavchik
           Author

NOTES
        1.

                      iconv(3)
           http://manpages.courier-mta.org/htmlman3/iconv.3.html

Courier Unicode Library           07/29/2015                UNICODE_CONVERT(3)