Paths In CMarkup

Although you can navigate to any position in CMarkup without paths, paths greatly accelerate navigation of the document.

CMarkup Developer License

Finding elements by absolute, relative and anywhere paths, simple predicates [n] [@attrib] [@a='X'] [CHILDELEM], and the GetElemPath FindSetData FindGetData methods are only in CMarkup Developer and the free XML editor  FOAL C++ scripting.

Update July 12, 2005: Release 8.0 added anywhere paths, simple predicate types, GetElemPath and affected the CMarkupMSXML path implementation. February 12, 2006: Release 8.2 added support for an additional simple predicate form [@attrib='value'].

We will use the following sample document for discussion of paths.

<config>
  <diagnostics d="3">
    <file>C:\temp\a.txt</file>
  </diagnostics>
  <diagnostics d="7">
    <file title="Mike&apos;s">D:\temp\a.txt</file>
    <proxy usedefault="true"/>
  </diagnostics>
</config>

Absolute Path

If the path starts with a single slash it means start at the root of the document and the first tag name should be that of the root. To set the current position to the first diagnostics element under the root config element, call xml.FindElem("/config/diagnostics"). To avoid specifying the root element tag name or any tag name that does not need to be checked, use an asterisk (i.e. "/*/diagnostics").

Relative Path

By leaving out the initial slash you can delve multiple levels further into the document beginning at the current position. If your current main position was the config element "diagnostics/file" would move the main position to the file element. The relative path is seldom useful; sometimes it is confused with "anywhere path" which is very useful (the anywhere path is described below).

Nth Element Predicate and GetElemPath

The [n] predicate is generally used in an absolute path to specify the nth element with the specific name. GetElemPath returns a unique absolute path string of the main position element such as "/config/diagnostics[2]/file". This string is useful for re-finding the element after the unchanged document is reloaded or re-parsed.

xml.ResetPos();
xml.FindChildElem();
xml.FindChildElem();
xml.IntoElem();
xml.FindChildElem();
xml.IntoElem();
CMarkup csXMLPath = xml.GetElemPath();
// csXMLPath == "/config/diagnostics[2]/file"
CString csDoc = xml.GetDoc();
// ...
xml.SetDoc( csDoc );
xml.FindElem( csXMLPath );

Child Tag Name Predicate

The [CHILDTAG] predicate finds an element containing a child element with the specified tag name. The following code loops through all diagnostics elements containing a proxy element, which in this example is only the second diagnostics element, and does something with the d attribute of the diagnostics element.

xml.ResetPos();
xml.FindElem(); // config
xml.IntoElem();
while ( xml.FindElem("diagnostics[proxy]") )
  DoSomething( xml.GetAttrib("d") );

Attribute Predicate

The [@attrib] predicate finds an element that has an attribute with the specified name. The following code loops through all child elements of diagnostics that have usedefault attributes, and does something with the tag name and usedefault attribute of that element.

xml.ResetPos();
while ( xml.FindChildElem("diagnostics") )
{
  xml.IntoElem(); // diagnostics
  while ( xml.FindChildElem("*[@usedefault]") )
    DoSomething( xml.GetChildTagName(), xml.GetChildAttrib("usedefault") );
  xml.OutOfElem();
}

Anywhere Path

The anywhere path form //TAGNAME searches for an element with matching tag name by depth first traversal after the current position in the document. Here is a loop through every diagnostics element that does something with the d attribute.

xml.ResetPos();
while ( xml.FindElem("//diagnostics") )
  DoSomething( xml.GetAttrib("d") );

You can use an asterisk instead of the tag name to match every element. You can also use the [CHILDTAG] and [@attrib] predicates to refine your search. For example, //diagnostics[proxy] would find diagnostics elements at any level that contain proxy elements, and //*[@usedefault] would find any element at any level with a usedefault attribute.

Attribute Value Predicate

Update February 12, 2006: With Release 8.2, support for an additional simple predicate form [@attrib='value'] has been added. This allows you to specify the exact value of the attribute in the path. No other comparison operators other than the equal sign are supported, and it is only with an attribute value, not a child element value. Here is an example of looping through all of the hyperlink elements where the class is codelink in an HTML document:

while ( xml.FindElem("//a[@class='codelink']") )
  DoSomething( xml.GetAttrib("href") );

The tricky issue with specifying a value in the path is that the single quotes (apostrophes) must be escaped if there is any chance they will be there. You can use the EscapeText function as follows:

CString strTitle = "Mike's";
CString strTitleE = xml.EscapeText( strTitle, xml.MNF_ESCAPEQUOTES );
if ( xml.FindElem("//file[@title='"+strTitleE+"']") )
  ...

Update November 20, 2010: CMarkup release 11.3 introduces document flags to trim whitespace and collapse whitespace (see Whitespace and CMarkup). If one of these flags is used, attribute values from the document will be trimmed or collapsed before being compared to the value specified in the predicate.

FindSetData and FindGetData

The FindSetData and FindGetData methods allow the document to be accessed very quickly in applications that use the XML document like a structure. For example, the XML Messaging project uses a document like the following:

<BinaryCollaboration>
  <OrderIndentifier>1</OrderIndentifier>
  <OrderStatus>Requested</OrderStatus>
  <ProductQuantity>15</ProductQuantity>
</BinaryCollaboration>

All path features are supported, so you can use "/*/OrderStatus" which indicates that you want to start at the root of the document, that there is no need to verify the tag name of the root element, and that you want to set the current position to the OrderStatus element and return its data. To get the order status, this:

xml.FindElem( "/*/OrderStatus" );
strStatus = xml.GetData()

is equivalent to this:

strStatus = xml.FindGetData( "/*/OrderStatus" );

You can also change the order status in one step by calling

xml.FindSetData( "/*/OrderStatus", strNewStatus );

Or using the other example document, you could change a configuration setting just like this:

xml.FindSetData( "/config/diagnostics/file", strNewFilename );

CMarkup Path and XPath

XPath and XPath-like paths beginning with a slash work differently from CMarkup absolute paths. They are meant for querying while CMarkup absolute paths are for targeting a specific element.

Update July 12, 2005: CMarkupMSXML release 8.0 has been fixed to be consistent with CMarkup. Prior to release 8.0, the MSXML function selectSingleNode was used to implement path finding in CMarkupMSXML FindElem, FindChildElem, FindSetData and FindGetData. This meant it worked differently from CMarkup with respect to absolute paths because the purpose of MSXML selectSingleNode is to query the document rather than to find the specific element.

A path such as /A/B/C in CMarkup means look for a C element under the first B element under the (first) A element where A is the root element in a well-formed XML document. However, in XPath, if the first B element does not contain a C element, it looks in the next B element under A and so on. This is a subtle distinction which may be further clarified by the following question and answer.

 

comment posted FindElem(path)

Jonnie White 13-Jan-2003

In this document:

<?xml version="1.0" encoding="windows-1252"?>
<root>
  <B ID="1"/>
  <B ID="2">
    <C ID="1"/>
  </B>
</root>

if I search for the 'C' element using FindElem("/root/B/C"), the function fails! Is this supposed to happen?

In CMarkup FindElem("/root/B/C") returns false to tell you that the first occurrence of B does not contain a C. One way to think of it is that the find path is also used in FindSetData and FindGetData so you don't want to be setting just any element data!

However, you might notice that the CMarkupMSXML prior to release 8.0 does find the C because it is implemented using selectSingleNode to select the first of all C children of B of root. CMarkup FindElem is based on a stricter concept of going to the specified child and looking inside it because it is only looking for a specific one. To find that C with CMarkup you would call FindElem("/root/B[2]/C").

 

comment posted CMarkup Path and XPath

30-Mar-2007

I understand, that CMarkup Path and XPath are different and that FindElem("/root/B/C") will fail, because the first "B" element does not have a "C" child. But what about FindElem("/root/B[ID="2"]/C" "/root/B[@ID='2']/C")? Does it work or does it fail, too?

Yes, FindElem("/root/B[@ID='2']/C") works. Note the correction; the @ sign in @ID means it is an attribute, ID would refer to a child element. Under root, "/root/B[@ID='2']/C" finds the first B element with the ID attribute of 2, and then the first C element within that.

 

comment posted How to Select All Nodes Using XPath

Agnel CJ Kurian 06-Jun-2007

I tried markup.FindElem("//*") to traverse through all elements. It returned false at the first call itself even though there were nodes. Why?

The path feature of CMarkup is only in CMarkup Developer and the free XML editor  FOAL C++ scripting and the anywhere path // was implemented in release 8.0. The alternative you can use to loop through all elements in the Evaluation Version of CMarkup is described in Depth First Traversal. And keep in mind that paths in CMarkup are not the same as XPath (see above "CMarkup Path and XPath"), one important difference being that CMarkup paths do not copy anything out of the document, they just set the current position within the document object.