Subdocuments and Fragments of XML Documents
This article discusses the terminology of document fragments and the details of CMarkup's support for them. A subdocument is an element with its attributes and all its content as a unit, even if the element contains a whole tree of elements. Also, a subdocument is a well-formed document on its own, for example:
<R> <NAME form="f">John</NAME> <MSG flag="1">Pay <B>Attention!</B></MSG> <ID>10</ID> </R>
A document fragment can be anything that you would find in the content of an element. It might be text or elements, or both in the case of mixed content. So, while a subdocument is a type of document fragment it is limited to those document fragments that have exactly one element at the top level. The following document fragment is taken from the content of the R element above and it consists of three sibling elements with no parent element so it would not be considered a subdocument:
<NAME form="f">John</NAME> <MSG flag="1">Pay <B>Attention!</B></MSG> <ID>10</ID>
CMarkup has these methods for handling subdocuments: AddSubDoc, AddChildSubDoc, InsertSubDoc, InsertChildSubDoc, GetSubDoc, GetChildSubDoc. There are two methods for content: GetElemContent, SetElemContent.
GetElemContent is the inner XML while
GetSubDoc is the outer XML. The corresponding Microsoft DOM extension methods would be
GetElemContent returns the text and markup from the content of the element, while
GetSubDoc returns the same thing plus the element's start and end tags around it. The special markup characters are not escaped, they are maintained as markup. So if the current element is MSG in the above example,
<MSG flag="1">Pay <B>Attention!</B></MSG>
As of release 8.0, CMarkup takes a lenient approach to document fragments that are not well formed.
AddSubDoc does not abort though it still returns
false if the subdocument is not well-formed. Note that if you have really bad markup a lone end tag can cause the next parse of the entire document to result in a different heirarchy (see Containment Hierarchy).
GetElemContent instead of
GetData to obtain the following string:
text with <a href="example.html">hyperlink</a>