Whitespace and CMarkup

 

comment posted trimming white space

Marc Dyksterhouse 17-Mar-2010

Is there a way to have GetData or some other call return just the text of an element and not the whitespace around it? For example, can GetData return "text" in the following XML instead of "  text\n"?

<item>
 text
</item>

I know I can just trim the returned string, but since whitespace isn't supposed to be pertinent in XML, I just thought the library should work this way. In the few cases where I need to preserve whitespace, I can use a CDATA encoding.

With release 11.3 you can set flags to trim whitespace or collapse whitespace when reading values from the document. CMarkup is unusual among XML tools because it simply preserves all whitespace, but now it can also support standard ways that XML and HTML processors alter whitespace.

Whitespace includes spaces, tabs, returns and newline characters. CMarkup has always preserved the whitespace as it appears in the document, and it still will. These new flags give you the option of reading the trimmed or collapsed text values, but the document is not altered, so you can turn off the flags and go back to reading the preserved whitespace.

Document Flag Purpose
MDF_TRIMWHITESPACE removes leading and trailing whitespace
MDF_COLLAPSEWHITESPACE removes leading and trailing whitespace, but also replaces all segments of whitespace inside the text with a single space; so for example a newline and tab within the text will become a single space

These flags affect CMarkup methods like GetData and GetAttrib that retrieve element data, text nodes, and attributes (but not methods like GetSubDoc and GetElemContent that return XML i.e. markup text).

These flags have no effect on text retrieved from CDATA Sections. With CMarkup you can create elements to contain CData Section text to protect the whitespace from ever being altered by CMarkup or any other XML tool:

xml.AddElem( "Prose", strProseText, CMarkup::MNF_WITHCDATA );

Turn the whitespace flags on and off anytime without performance penalty if for example you want to trim some values and not others. Use SetDocFlags to set these flags.

CMarkup m;
m.SetDocFlags( CMarkup::MDF_TRIMWHITESPACE );

You can OR a flag with GetDocFlags if you don't want to affect other flags:

m.SetDocFlags( m.GetDocFlags() | CMarkup::MDF_COLLAPSEWHITESPACE );

Turn off a flag without affecting others as follows:

m.SetDocFlags( m.GetDocFlags() & ~CMarkup::MDF_COLLAPSEWHITESPACE );

These whitespace flags can affect values returned by GetData, GetAttrib and related methods. They also affect methods like FindElem that search for a path specifying a value in a path attribute predicate (see Paths In CMarkup) because values from the document will be trimmed or collapsed before being compared to the specified value.

See also:
Node Methods in CMarkup