DragonFly On-Line Manual Pages

iconv(3)              DragonFly Library Functions Manual              iconv(3)

NAME
       iconv - charset conversion function

SYNOPSIS
       #include <iconv.h>

       size_t iconv(iconv_t cd, const char **inbuf,
            size_t *inbytesleft, char **outbuf,
            size_t *outbytesleft);

DESCRIPTION
       The iconv() function converts the sequence of characters from one
       charset, in the array specified by inbuf, into a sequence of
       corresponding characters in another charset, in the array specified by
       outbuf.  The charsets are those specified in the iconv_open() call that
       returned the conversion descriptor, cd.  The inbuf argument points to a
       variable that points to the first character in the input buffer and
       inbytesleft indicates the number of bytes to the end of the buffer to
       be converted.  The outbuf argument points to a variable that points to
       the first available byte in the output buffer and outbytesleft
       indicates the number of the available bytes to the end of the buffer.

       For state-dependent encodings, the conversion descriptor cd is placed
       into its initial shift state by a call for which inbuf is a null
       pointer, or for which inbuf points to a null pointer.  When iconv() is
       called in this way, and if outbuf is not a null pointer or a pointer to
       a null pointer, and outbytesleft points to a positive value, iconv()
       will place, into the output buffer, the byte sequence to change the
       output buffer to its initial shift state.  If the output buffer is not
       large enough to hold the entire reset sequence, iconv() will fail and
       set errno to E2BIG.  Subsequent calls with inbuf as other than a null
       pointer or a pointer to a null pointer cause the conversion to take
       place from the current state of the conversion descriptor.

       If a sequence of input bytes does not form a valid character in the
       specified charset, conversion stops after the previous successfully
       converted character.  If the input buffer ends with an incomplete
       character or shift sequence, conversion stops after the previous
       successfully converted bytes.  If the output buffer is not large enough
       to hold the entire converted input, conversion stops just prior to the
       input bytes that would cause the output buffer to overflow.  The
       variable pointed to by inbuf is updated to point to the byte following
       the last byte successfully used in the conversion.  The value pointed
       to by inbytesleft is decremented to reflect the number of bytes still
       not converted in the input buffer.  The variable pointed to by outbuf
       is updated to point to the byte following the last byte of converted
       output data.  The value pointed to by outbytesleft is decremented to
       reflect the number of bytes still available in the output buffer.  For
       state-dependent encodings, the conversion descriptor is updated to
       reflect the shift state in effect at the end of the last successfully
       converted byte sequence.

       If iconv() encounters a character in the input buffer that is legal,
       but for which an identical character does not exist in the target
       charset, iconv() performs an implementation-defined conversion on this
       character.

RETURN VALUES
       The iconv() function updates the variables pointed to by the arguments
       to reflect the extent of the conversion and returns the number of non-
       identical conversions performed.  If the entire string in the input
       buffer is converted, the value pointed to by inbytesleft will be 0.  If
       the input conversion is stopped due to any conditions mentioned above,
       the value pointed to by inbytesleft will be non-zero and errno is set
       to indicate the condition.  If an error occurs iconv() returns (size_t)
       -1 and sets errno to indicate the error.

ERRORS
       The iconv() function will fail if:

       EILSEQ         Input conversion stopped due to an input byte that does
                      not belong to the input charset.

       E2BIG          Input conversion stopped due to lack of space in the
                      output buffer.

       EINVAL         Input conversion stopped due to an incomplete character
                      or shift sequence at the end of the input buffer.

       The iconv() function may fail if:

       EBADF          The cd argument is not a valid open conversion
                      descriptor.

APPLICATION USAGE
       The inbuf argument indirectly points to the memory area which contains
       the conversion input data. The outbuf argument indirectly points to the
       memory area which is to contain the result of the conversion. The
       objects indirectly pointed to by inbuf and outbuf are not restricted to
       containing data that is directly representable in the ISO C language
       char data type. The type of inbuf and outbuf, char **, does not imply
       that the objects pointed to are interpreted as null-terminated C
       strings or arrays of characters. Any interpretation of a byte sequence
       that represents a character in a given character set encoding scheme is
       done internally within the codeset converters.  For example, the area
       pointed to indirectly by inbuf and/or outbuf can contain all zero
       octets that are not interpreted as string terminators but as coded
       character data according to the respective codeset encoding scheme. The
       type of the data  (char, short int, long int,  and so on) read or
       stored in the objects is not specified, but may be inferred for both
       the input and output data by the converters determined by the
       from_charset and to_charset arguments of iconv_open().

       Regardless of the data type inferred by the converter, the size of the
       remaining space in both input and output objects (the intbytesleft and
       outbytesleft arguments) is always measured in bytes.

IMPLEMENTATION DETAILS
       Conversions between different charsets are done via the UCS-4 universal
       character set. Conversions between the same charset (e.g.  when two
       different aliases of the same charset are used) are done by direct
       copying from the input buffer to the output one. The libiconv library
       itself usually contains only a small set of (built-in) charsets.
       Tables for conversion between UCS-4 and particular charsets are mapped
       to memory from binary table files, or C methods are loaded dynamically
       from shared modules:

       Coded character sets (CCS)
              Each CCS file contains tables for convertion between exactly one
              character of a corresponding charset and one UCS-4 character,
              and vice versa, a UCS-4 character to the character of the CCS
              charset. About 200 character sets are supported (only those used
              in FreeBSD distribution is provided in this package) including
              ASCII and the following standards: ISO-8859, KOI8, Windows,
              IBM-DOS, Macintosh, CJK national charsets and EBCDIC.  CCS files
              are accessed via memory mapping.

       Character encoding schemes (CES)
               Each CES module contains functions converting a byte sequence
               of a corresponding encoding scheme to exactly one UCS-4 32-bit
               character, and vice versa, a UCS-4 character to a byte sequence
               of the CES.  The following CES groups are supported in the
               iconv-1.0: ISO-10646 (UCS-4 and UCS-2, each in both
               architecture independent (network) and dependent (internal)
               byte order versions), Unicode (UTF-16, UTF-8 and UTF-7),
               ISO-2022 and Extended Unix Code (EUC) (both for Chinese (CN and
               TW), Japanese and Korean languages). A special table-driven CES
               module providing conversion for all CCS tables is always built
               in into the library.  ISO-2022, EUC and table-driven modules
               use one or more memory-mapped CCS tables.

       Any CCS table or CES module can be built in into the library at
       compilation time.

       A CCS or CES charset can have zero or more aliases (alternative names)
       which are listed in charset.aliases file located in the same directory
       as CCS tables. The library maps the aliases file to memory to find
       canonical charset names.

       If iconv() encounters a character in the input buffer that is legal,
       but for which an identical character does not exist in the target
       charset, iconv() replaces the source character with the  '_'
       (underscore) character and tries to convert it into the target charset.
       If there is no underscore character in the target charset, no bytes are
       written to the target buffer for the source character. In any case,
       iconv() increments the number of non-identical conversions performed
       (the value being returned as the function result).

FILES
       /usr/local/share/iconv/charset.aliases
                                Charset aliases file
       /usr/local/share/iconv/*.cct
                                CCS conversion tables
       /usr/local/libexec/iconv/*.so
                                CES conversion modules

SEE ALSO
       iconv(1), iconv_close(3), iconv_open(3)

                                  7 Sep 2000                          iconv(3)