| ||||||||
Lookup XML Data with CMarkupCMarkup makes navigation easy and efficient with its core methods. These can bring you real value at lightning speeds, and all while keeping your code easily maintainable and extendable. Loop And CompareWhat could be easier than using familiar core functions of your XML tool to loop through all the items until finding the one that matches? Here is an example involving the need to ignore case while searching for a value. It is adapted from MSDN article 315719 on MSXML case-insensitive search. <Domains>
<DomainName userid="rain5">Uhdomain1.COM</DomainName>
<DomainName userid="cloud1">Mydomain1.COM</DomainName>
</Domains>
This code loops through any DomainName elements under the root element, and does something with the userid if the matching value is found. The beauty of this solution is that the difference between a case-sensitive search and a non-sensitive one is trivial. If someone went into someone else's code to modify it, they wouldn't have to do any research. In fact they could even implement a much more complex comparison such as comparing with and without the http prefix without much difficulty either. xml.ResetPos();
while ( xml.FindChildElem("DomainName") )
if ( xml.GetChildData().CompareNoCase("mydomain1.com") == 0 )
{
DoSomething( xml.GetChildAttrib("userid") );
break;
}
Incidentally, these same CMarkup methods will work whether Why XPath is a Bad IdeaSo, CMarkup makes it easy and efficient to lookup something in your document. If instead you try to use XPath (a lookup technology used in some XML tools) it is not easy and likely not efficient either. With MSXML XPath the complexity begins with the differences in functionality between product versions. With MSXML 3.0 you must turn on XPath and use the oXML.setProperty "SelectionLanguage", "XPath" set node = oXML.selectSingleNode( "Domains/DomainName[ translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'mydomain1.com']" ) With MSXML 4.0 you have the option of using the oXML.setProperty "SelectionNamespaces", "xmlns:ms='urn:schemas-microsoft-com:xslt'" set node = oXML.selectSingleNode( "Domains/DomainName[ ms:string-compare(., 'mydomain1.com', 'en-US', 'i') = 0]" ) A potential advantage of XPath in this situation is that it can sometimes achieve a high performance by taking advantage of the inner workings of the component since it goes all the way to the result in one call. But the disadvantages of XPath are many (the CMarkup solution shown above has none of these disadvantages).
But perhaps the biggest disadvantage of all with XPath is the additional complexity of going from a simple search to one that ignores case. What might be assumed to be a trivial modification becomes a potential headache. And again XPath becomes even more difficult and less efficient when you add a complication like not assuming the domains are normalized to the http:// form or you need to check uniqueness. Creating a Lookup Table (Unique Name Map)CMarkup also allows you to easily build a map of all the domains in the document for quick lookup. Suppose we need to look up domain names quickly, we would loop through them once and save their positions. xml.ResetPos(); // top of doc
xml.FindElem(); // /Domains
xml.IntoElem();
while ( xml.FindElem("DomainName") )
{
CString strDomain = xml.GetData();
strDomain.MakeLower();
xml.SavePos( strDomain );
}
Internally, CMarkup uses the string name as the key to a hash map so it is a very quick lookup (SavePos/RestorePos support just one such logical lookup table per document). Then whenever we need to look up the domain name and do something with the userid, just: strDomain.MakeLower();
if ( xml.RestorePos(strDomain) )
DoSomething( xml.GetAttrib("userid") );
Building A Unique ListAnother application of unique named positions in CMarkup is compiling a count of unique words. For example, a customer database has a country element telling where each customer is located. The following code will loop through the customer XML database and generate a small document listing countries and counts. This example uses the anywhere path CMarkup xmlCountries;
xmlCustomerDB.ResetPos();
while ( xmlCustomerDB.FindElem("//Country") )
{
CString csCountry = xmlCustomerDB.GetData();
if ( xmlCountries.RestorePos(csCountry) )
{
// Increment count
xmlCountries.SetAttrib( "n", atoi(xmlCountries.GetAttrib("n"))+1 );
}
else
{
// Add country to list
xmlCountries.AddElem( "C", csCountry );
xmlCountries.SetAttrib( "n", 1 );
xmlCountries.SavePos( csCountry );
}
}
<C n="32">United States</C>
<C n="12">Canada</C>
<C n="14">United Kingdom</C>
<C n="2">China</C>
<C n="8">Japan</C>
<C n="1">Kenya</C>
More On Navigating XMLThere are several other articles about getting around in your XML with CMarkup.
There are a number of ways to go about it where you can weigh performance issues. The simplest to code if you have implemented the ID attribute is to use the attribute value predicate to find it (see Paths In CMarkup): xml.ResetPos(); xml.FindElem( "//*[@ID='5']" ); That will do a depth first traversal internally to find the element. If you are finding them in order, you don't need to To utilize a hash table lookup for quicker random access, save each position with the string ID as you are creating it. xml.SetAttrib( "ID", "5" ); xml.SavePos( "5" ); Then later you can go directly back to that position: xml.RestorePos( "5" ); See SavePos and RestorePos. If you have hundreds of IDs the saved position performance will degrade but still be better than the attribute value predicate for random access. These saved positions are lost when the document is reparsed using Load or SetDoc, but you can set them with a quick scan through the document (using the anywhere path and attribute predicate described in Paths In CMarkup): xml.ResetPos();
while ( xml.FindElem("//*[@ID]") )
xml.SavePos( xml.GetAttrib("ID") );
Ultimately, you can control implementation and performance using indexes (see ElemIndex Navigation). Since your ID is a simple array from 1 to n, you can just use an integer array or vector to store the indexes. SetAttrib( "ID", i ); a[i] = xml.GetElemIndex(); ++i; and later return to ID xml.GotoElemIndex( a[i] ); These indexes remain valid even as the document is modified, until it is reparsed. So you would need to build this array every time the document is parsed. Building this array of indexes is actually a very quick process roughly the same order of magnitude as the time to parse the document. Use a "grow by" mechanism or size estimation to reserve array size ahead and avoid realloc churn. This quick once-through every time you parse will give you instantaneous random access to your large document. If every element you need to lookup has an ID attribute, something like this will build the array: CArray a;
xml.ResetPos();
while ( xml.FindElem("//*[@ID]") )
{
int i = atoi(xml.GetAttrib("ID"));
a.SetAtGrow( i, xml.GetElemIndex() );
}
You may need to scan all the ID values every time you re-parse to know what the next available ID is, anyway.
Since I've gone with an XPath subset in the However, taking a Regex approach is an interesting point I hadn't considered. I don't recall ever needing to do something like find CMarkup always errs on the side of simplicity, letting you perform the full range of comparison options in your natural procedural language (as mentioned above). For example, you would search for xml.ResetPos();
while ( xml.FindElem("//Name") )
{
if ( strncmp(xml.GetData(),"Fil",3) == 0 )
{
// process match for "Fil*"
}
}
| ||||||||||||
|
Posted February 27, 2006 updated March 26, 2006. Question or comment about this article? ©Copyright 2008 First Objective Software, Inc. All rights reserved. |