| ||||||||
UTF-8 Files and the Preamble
The UTF-8 preamble, also known as the UTF-8 BOM or signature, is a 3 byte sequence at the start of a file indicating it is UTF-8. Like the UTF-16 BOM, this is not particular to XML, it is for any text file. But unlike the UTF-16 BOM, Byte Order Mark is not a correct term in this case because in UTF-8 there is no byte order. In hex, the UTF-8 preamble is While the UTF-16 BOM is standard, the UTF-8 preamble is not widely accepted and it is discouraged on UNIX operating systems. Microsoft Notepad uses the UTF-8 preamble when it saves UTF-8 documents, but does not need it to recognize UTF-8 encoding when it loads files. The 3 byte UTF-8 preamble is not recommended in XML files because if the file begins with an ASCII less than sign, it is already assumed to be UTF-8 unless the XML Declaration specifies another encoding. If due to circumstances your file has a UTF-8 preamble, CMarkup 7.2 (developer version) will support it much the way it supports the UTF-16 BOM. If the UTF-8 preamble is discovered on Load, the xml.SetDocFlags( xml.GetDocFlags() | xml.MDF_UTF8PREAMBLE ); // on xml.SetDocFlags( xml.GetDocFlags() & ~xml.MDF_UTF8PREAMBLE ); // off This flag is also supported in the ReadTextFile and WriteTextFile functions. |
|
Posted September 27, 2004. Question or comment about this article? ©Copyright 2008 First Objective Software, Inc. All rights reserved. |