Subdocuments and Fragments of XML Documents

This article discusses the terminology of document fragments and the details of CMarkup's support for them. A subdocument is an element with its attributes and all its content as a unit, even if the element contains a whole tree of elements. Also, a subdocument is a well-formed document on its own, for example:

<NAME form="f">John</NAME>
<MSG flag="1">Pay <B>Attention!</B></MSG>

A document fragment can be anything that you would find in the content of an element. It might be text or elements, or both in the case of mixed content. So, while a subdocument is a type of document fragment it is limited to those document fragments that have exactly one element at the top level. The following document fragment is taken from the content of the R element above and it consists of three sibling elements with no parent element so it would not be considered a subdocument:

<NAME form="f">John</NAME>
<MSG flag="1">Pay <B>Attention!</B></MSG>

CMarkup has these methods for handling subdocuments: AddSubDoc, AddChildSubDoc, InsertSubDoc, InsertChildSubDoc, GetSubDoc, GetChildSubDoc. There are two methods for content: GetElemContent, SetElemContent.

GetElemContent is the inner XML while GetSubDoc is the outer XML. The corresponding Microsoft DOM extension methods would be InnerXml and Xml or ToString. GetElemContent returns the text and markup from the content of the element, while GetSubDoc returns the same thing plus the element's start and end tags around it. The special markup characters are not escaped, they are maintained as markup. So if the current element is MSG in the above example, GetSubDoc returns:

<MSG flag="1">Pay <B>Attention!</B></MSG>

GetElemContent returns:

Pay <B>Attention!</B>

As of release 8.0, CMarkup takes a lenient approach to document fragments that are not well formed. AddSubDoc does not abort though it still returns false if the subdocument is not well-formed. Note that if you have really bad markup a lone end tag can cause the next parse of the entire document to result in a different heirarchy (see Containment Hierarchy).


comment posted html in an element

Geert van Horrik 03-Dec-2005

I want to add some html code to an element like this:

<ACTION type="shownotifier">
    <DESCRIPTION>text with <a href="example.html">hyperlink</a></DESCRIPTION>

But when I am parsing the XML file, it seems it can't read the data in the description. I use this code:

... other code ...

// Jump into element
sTitle = xml.GetData();
// Reset pointer
sDescription = xml.GetData();
... other code ...

It works perfectly when I am not using html tags, but with the tags, the sDescription will be empty.

Use GetElemContent instead of GetData to obtain the following string:

text with <a href="example.html">hyperlink</a>