CMarkup UnescapeText Method

static MCD_STR CMarkup::UnescapeText( MCD_CSTR szText, int nTextLength = -1 );

The UnescapeText static utility function is used internally to unescape the Standard Special Characters when extracting a text value from the document. You can use it to unescape special characters when dealing with markup text directly.

The UnescapeText function works for both null terminated strings and strings that are not if nTextLength is specified. The 5 standard special characters are decoded, as are any Numeric Character References, and the standard HTML entities. Any unrecognized entity references are left unchanged.

See also EscapeText.

comment posted Technical question about CMarkup::UnescapeText(...)

Tom 06-Dec-2007

When using CMarkup::UnescapeText(...) I encountered some errors only in a MBCS release build with VC2003 SP1. The ampersand "&" in some (not all eg. ü --> ü) entities is replaced with an unprintable character. In my opinion this could be a compiler bug as I couldn't find any differences in the sourcecode. By the way it would be very nice that this function replaces all available entities (about 100).

We were not able to determine the cause of the problem in Tom's case, but before release 10.0 CMarkup would not convert uuml because it was not one of the 5 standard XML entities amp, quot, apos, lt and gt (see Standard Special Characters), and to get uuml you would need to use the numeric character reference Ü.

Update September 27, 2008: With CMarkup release 10.0, over 200 standard HTML entities are now also unescaped by this function. So, for example, ü would be decoded to the actual ü character in the returned string. The lookup mechanism is a small footprint predefined static hash table (see PredefEntityTable in Markup.cpp).