DragonFly On-Line Manual Pages
UNICODE_CONVERT(3) Courier Unicode Library UNICODE_CONVERT(3)
NAME
unicode_u_ucs4_native, unicode_u_ucs2_native, unicode_convert_init,
unicode_convert, unicode_convert_deinit, unicode_convert_tocbuf_init,
unicode_convert_tou_init, unicode_convert_fromu_init,
unicode_convert_uc, unicode_convert_tocbuf_toutf8_init,
unicode_convert_tocbuf_fromutf8_init, unicode_convert_toutf8,
unicode_convert_fromutf8, unicode_convert_tobuf,
unicode_convert_tou_tobuf, unicode_convert_fromu_tobuf - unicode
character set conversion
SYNOPSIS
#include <courier-unicode.h>
extern const char unicode_u_ucs4_native[];
extern const char unicode_u_ucs2_native[];
unicode_convert_handle_t unicode_convert_init(const char *src_chset,
const char *dst_chset,
void *cb_arg);
int unicode_convert(unicode_convert_handle_t handle, const char *text,
size_t cnt);
int unicode_convert_deinit(unicode_convert_handle_t handle,
int *errptr);
unicode_convert_handle_t
unicode_convert_tocbuf_init(const
char *src_chset,
const char *dst_chset,
char **cbufptr_ret,
size_t *cbufsize_ret,
int nullterminate);
unicode_convert_handle_t
unicode_convert_tocbuf_toutf8_init(const
char
*src_chset,
char **cbufptr_ret,
size_t *cbufsize_ret,
int nullterminate);
unicode_convert_handle_t
unicode_convert_tocbuf_fromutf8_init(const
char *dst_chset,
char **cbufptr_ret,
size_t *cbufsize_ret,
int nullterminate);
unicode_convert_handle_t
unicode_convert_tou_init(const
char *src_chset,
unicode_char **ucptr_ret,
size_t *ucsize_ret,
int nullterminate);
unicode_convert_handle_t
unicode_convert_fromu_init(const
char *dst_chset,
char **cbufptr_ret,
size_t *cbufsize_ret,
int nullterminate);
int unicode_convert_uc(unicode_convert_handle_t handle,
const unicode_char *text, size_t cnt);
char *unicode_convert_toutf8(const char *text, const char *charset,
int *error);
char *unicode_convert_fromutf8(const char *text, const char *charset,
int *error);
char *unicode_convert_tobuf(const char *text, const char *charset,
const char *dstcharset, int *error);
int unicode_convert_toubuf(const char *text, size_t text_l,
const char *charset, unicode_char **uc,
size_t *ucsize, int *error);
int unicode_convert_fromu_tobuf(const unicode_char *utext,
size_t utext_l, const char *charset,
char **c, size_t *csize, int *error);
DESCRIPTION
unicode_u_ucs4_native[] contains the string "UCS-4BE" or "UCS-4LE",
matching the native unicode_char endianness.
unicode_u_ucs2_native[] contains the string "UCS-2BE" or "UCS-2LE",
matching the native unicode_char endianness.
unicode_convert_init(), unicode_convert(), and unicode_convert_deinit()
are an adaption of th iconv(3)[1] API that uses the same calling
convention as the other algorithms in this unicode library, with some
value-added features. These functions use iconv(3) to effect the actual
character set conversion.
unicode_convert_init() returns a non-NULL handle for the requested
conversion, or NULL if the requested conversion is not available.
unicode_convert_init() takes a pointer to the output function that
receives receives converted character text. The output function
receives a pointer to the converted character text, and the number of
characters in the converted text. The output function gets repeatedly
called, until it receives the entire converted text.
The character text to convert gets passed, repeatedly, to
unicode_convert(). Each call to unicode_convert() results in the output
function getting invoked, zero or more times, with each successive part
of the converted text. Finally, unicode_convert_deinit() stops the
conversion and deallocates the conversion handle.
It's possible that a call to unicode_convert_deinit() results in some
additional calls to the output function, passing the remaining, final
parts, of the converted text, before unicode_convert_deinit()
deallocates the handle, and returns.
The output function should return 0 normally. A non-0 return indicates
n error condition. unicode_convert_deinit() returns non-zero if any
previous invocation of the output function returned non-zero (this
includes any invocations of the output function resulting from this
call, or prior unicode_convert() calls), or 0 if all invocations of the
output function returned 0.
If the errptr is not NULL, *errptr gets set to non-zero if there were
any conversion errors -- if there was any text that could not be
converted to the destination character text.
unicode_convert() also returns non-zero if it calls the output function
and it returns non-zero, however the conversion handle remains
allocated, so unicode_convert_deinit() must still be called, to clean
that up.
Collecting converted text into a buffer
Call unicode_convert_tocbuf_init() instead of unicode_convert_init(),
then call unicode_convert() and unicode_convert_deinit() normally. The
parameters to unicode_convert_init() specify the source and the
destination character sets. unicode_convert_tocbuf_toutf8_init() is
just an alias that specifies UTF-8 as the destination character set.
unicode_convert_tocbuf_fromutf8_init() is just an alias that specifies
UTF-8 as the source character st.
These functions supply an output function that collects the converted
text into a malloc()ed buffer. If unicode_convert_deinit() returns 0,
*cbufptr_ret gets initialized to a malloc()ed buffer, and the number of
converted characters, the size of the malloc()ed buffer, get placed
into *cbufsize_ret.
Note
If the converted string is an empty string, *cbufsize_ret gets set
to 0, but *cbufptr_ret still gets initialized (to a dummy malloced
buffer).
A non-zero nullterminate places a trailing \0 character after the
converted string (this is included in *cbufsize_ret).
Converting between character sets and unicode
unicode_convert_tou_init() converts character text into a unicode_char
buffer. It works just like unicode_convert_tocbuf_init(), except that
only the source character set gets specified and the output buffer is a
unicode_char buffer. nullterminate terminates the converted unicode
characters with a U+0000.
unicode_convert_fromu_init() converts unicode_chars to the output
character set, and also works like unicode_convert_tocbuf_init().
Additionally, in this case, unicode_convert_uc() works just like
unicode_convert() except that the input sequence is a unicode_char
sequence, and the count parameter is th enumber of unicode characters.
One-shot conversions
unicode_convert_toutf8() converts the specified text in the specified
text into a UTF-8 string, returning a malloced buffer. If error is not
NULL, even if unicode_convert_toutf8() returns a non NULL value *error
gets set to a non-zero value if a character conversion error has
occured, and some characters could not be converted.
unicode_convert_fromutf8() does a similar conversion from UTF-8 text to
the specified character set.
unicode_convert_tobuf() does a similar conversion between two different
character sets.
unicode_convert_tou_tobuf() calls unicode_convert_tou_init(), feeds the
character string through unicode_convert(), then calls
unicode_convert_deinit(). If this function returns 0, *uc and *ucsize
are set to a malloced buffer+size holding the unicode char array.
unicode_convert_fromu_tobuf() calls unicode_convert_fromu_init(), feeds
the unicode array through unicode_convert_uc(), then calls
unicode_convert_deinit(). If this function returns 0, *c and *csize are
set to a malloced buffer+size holding the char array.
SEE ALSO
courier-unicode(7), unicode_convert_tocase(3),
unicode_default_chset(3).
AUTHOR
Sam Varshavchik
Author
NOTES
1.
iconv(3)
http://manpages.courier-mta.org/htmlman3/iconv.3.html
Courier Unicode Library 07/29/2015 UNICODE_CONVERT(3)