UTF16To8 Method
static int CMarkup::UTF16To8(
char *pszUTF8,
const wchar_t* pwszUTF16,
int nUTF8Count
);
UTF16To8 converts the UTF-16 string in pwszUTF16 to UTF-8 in the pszUTF8 string buffer. It uses the same arguments as the ANSI C wcstombs function, but instead of converting to the locale charset it converts to UTF-8.
The pwszUTF16 source must be a null-terminated UTF-16 string. If pszUTF8 is NULL, the number of bytes required is returned and nUTF8Count is ignored. Otherwise pszUTF8 is filled with the result string. nUTF8Count is the byte size of pszUTF8 and must be large enough to allow for a null-terminator in pszUTF8 if a null-terminator is desired. The number of bytes (excluding NULL) is returned.
The following example converts the Treble Clef character from UTF-16 to UTF-8, and then back to UTF-16. This is an example of a (rare) character that requires a surrogate pair in UTF-16 (see UTF-16 Files and the Byte Order Mark (BOM)) and 4 bytes in UTF-8. Note that the 5 passed into UTF16To8 allows for the null-terminator (which is important for the strcmp check and to generate the null-terminator in the wide char result of UTF8To16).
nUTFLen = CMarkup::UTF16To8(szUTF8,L"\xD950\xDF21",5); // 0x64321
Check( strcmp(szUTF8,"\xF1\xA4\x8C\xA1") == 0 );
nUTFLen = CMarkup::UTF8To16(wszUTF16,szUTF8,nUTFLen+1);
Check( wcscmp(wszUTF16,L"\xD950\xDF21") == 0 );
UTF16To8 and UTF8To16 have no dependencies and can be used in place of the MultiByteToWideChar and WideCharToMultiByte Win32 APIs which do not support UTF-8 on Windows 9X, NT3.5 and versions of CE, and are not available on other platforms.
|