CMarkup UTF8To16 Method
static int CMarkup::UTF8To16( unsigned short* pwszUTF16, const char* pszUTF8, int nUTF8Count );
UTF8To16 converts the UTF-8 string in
pszUTF8 to UTF-16 in the
pwszUTF16 string buffer. It uses the same arguments as the ANSI C
mbstowcs function, but instead of converting from the locale charset it converts from UTF-8.
Update December 17, 2008: With CMarkup release 10.1 the UTF-16 string type in the
UTF8To16 and UTF16To8 functions changed from
unsigned short*, since
wchar_t means UTF-32 on Linux and OS X.
pszUTF8 source must be a UTF-8 string which will be processed up to null-terminator or
NULL, the number of UTF-16 units required (i.e. UTF-16 length) is returned.
nUTF8Count is the maximum UTF-8 bytes to convert and should include
NULL if null-terminator is desired in result. If
pwszUTF16 is not
NULL it is filled with the result string and it must be large enough! The result will be null-terminated if
NULL encountered in
pwszUTF16 is not
NULL, the number of UTF-8 bytes converted is returned rather than the UTF-16 size.
The following example illustrates converting the letter z from UTF-16 to UTF-8, and then back to UTF-16. In the
UTF16To8 call, we pass
L"\x007A" which is a way of expressing UTF-16 char z. In the
UTF8To16 call, we pass the
wszUTF16 buffer and receive the result, "z", specifying the length of the UTF-8 source
+ 1 to include the null-terminator.
char szUTF8; unsigned short wszUTF16; int nUTFLen; nUTFLen = CMarkup::UTF16To8(szUTF8,L"\x007A",5); // z Check( strcmp(szUTF8,"z") == 0 ); nUTFLen = CMarkup::UTF8To16(wszUTF16,szUTF8,nUTFLen+1); Check( wcscmp(wszUTF16,L"z") == 0 );
Here is an example to demonstrate the common technique of passing a
NULL result buffer so that the function returns the necessary result length, before allocating the result buffer and calling the function again.
const char* pszTest = "hello"; unsigned short* pwszBuffer; int nLen = strlen( pszTest ); int nUTF16Len = CMarkup::UTF8To16(NULL,pszTest,nLen); pwszBuffer = new unsigned short[nUTF16Len+1]; CMarkup::UTF8To16(pwszBuffer,pszTest,nLen+1); nLen = CMarkup::UTF16To8(NULL,pwszBuffer,0); CString csTest; CMarkup::UTF16To8(csTest.GetBuffer(nLen),pwszBuffer,nLen); csTest.ReleaseBuffer(nLen); delete  pwszBuffer; Check( strcmp(csTest,pszTest) == 0 );
UTF8To16 and UTF16To8 have no dependencies and can be used in place of the
WideCharToMultiByte Win32 APIs which do not support UTF-8 on Windows 9X, NT3.5 and versions of CE, and are not available on other platforms.