ElemIndex Navigation

CMarkup Developer License

The ElemIndex methods are only in CMarkup Developer and the free XML editor  FOAL C++ scripting.

The ElemIndex methods take advantage of CMarkup's internal indexing to provide an extremely efficient way to store and recall element positions when navigating the document (when you know what you're doing).

The SavePos and RestorePos methods have a similar purpose but can be more cumbersome than these index methods. Like SavePos and RestorePos the ElemIndex methods end up being unnecessary in most navigation situations but occassionally they come in very handy. Also, note that the GetElemIndex method is const, meaning it does not modify the CMarkup object whereas SavePos does modify the object to store the named position in an internal map. See Lookup XML Data with CMarkup.

GetElemIndex, GetChildElemIndex and GetParentElemIndex return an integer index that can be used to return to the element at a later time. This is a unique index that is assigned the first time the element is parsed or added in the document, and it does not change even when the document is modified. However, it is invalid once the element is removed or a subdocument containing the element is removed, or the document is reloaded or re-parsed. It is up to the developer not to use indexes if they might be invalid based on the circumstances of the problem being solved.

If there is no current main or child position, the corresponding GetElemIndex or GetChildElemIndex call will return 0. The parent position index is always known, and is 0 when it is above the root level, i.e. logically outside of the document. The root element has a non-zero index (like all elements do).

Here is an example XML document:

<DATABASE>
  <SCALAR>432</SCALAR>
  <SCALAR>87</SCALAR>
</DATABASE>

For example, we navigate to the second SCALAR element in the above XML document. There are many ways of doing this navigation, but here we specify all tag names to make it very clear where the current position is.

xml.ResetPos();
xml.FindElem( "DATABASE" );
xml.IntoElem();
xml.FindElem( "SCALAR" );
xml.FindElem( "SCALAR" );
int nIndex = xml.GetElemIndex();

The nIndex value now holds the integer index of the SCALAR element with data value 87. If we were to add another SCALAR element, nIndex is still a valid index for getting back to where we were. For illustration, we'll add a SCALAR element before the current position which changes the current position to the new element SCALAR 891.

xml.InsertElem( "SCALAR", "891" );
<DATABASE>
  <SCALAR>432</SCALAR>
  <SCALAR>891</SCALAR>
  <SCALAR>87</SCALAR>
</DATABASE>

GotoElemIndex, GotoChildElemIndex and GotoParentElemIndex set the current main, child or parent position to the element at an index. This is a very efficient operation since it is the same index that is used to refer to elements internally. In our example, we return to SCALAR 87 as follows:

xml.GotoElemIndex( nIndex );

Note that the absolute path of the element may change due to document modifications but the index does not change. In the above example, the absolute path of SCALAR 87 was /DATABASE/SCALAR[2] until we inserted a SCALAR element before it, making it /DATABASE/SCALAR[3].

The ElemIndex methods work differently from SavePos and RestorePos in terms of parent child positioning. SavePos tracks the current parent, main and child position, while GetElemIndex only gets the index of the main position, and GetChildElemIndex only gets the index of the child position. The GotoElemIndex method clears the child position and will not restore any child position that was current when the index was retrieved.

Also, you can use the index returned by GetElemIndex or GetChildElemIndex in either Goto method depending on whether you want the element indicated by the index to be the current main position or child position. For example, calling GotoChildElemIndex(nIndex) in the above example would set the child position to SCALAR 87 (as a side effect the current main position would become the DATABASE element).

The ElemIndex methods can be useful in complicated processing such as sort or repeated random access of specific elements. This allows you to leave data stored efficiently in the document rather than copying it out into structures which can be redundant and inflexible.

For sorting you often want to retrieve your keys into an efficient array and cross reference back to the XML using the element indexes, whether your XML is serving as the live data source for a list or grid or you are processing input XML and generating sorted output XML.

The following XML document represents an attendance list.

<ATTENDANCE>
  <ATTENDEE code="401">
    <LASTNAME>Hoffman</LASTNAME>
    <STATE>Virginia</STATE>
  </ATTENDEE>
  <ATTENDEE code="961">
    <LASTNAME>Woods</LASTNAME>
    <STATE>Utah</STATE>
  </ATTENDEE>
  <ATTENDEE code="709">
    <LASTNAME>Garvey</LASTNAME>
    <STATE>Mississippi</STATE>
  </ATTENDEE>
</ATTENDANCE>

For efficient sorting we first extract the key value on which we will sort and store the element index alongside the key value. To keep the example brief we just use two variable length arrays with the intention of keeping the corresponding values at the same place in the respective arrays.

// Extract key values for sort
CStringArray csaSortValues;
CUIntArray cuiaSortIndexes;
xml.ResetPos();
xml.FindElem();
xml.IntoElem();
while ( xml.FindElem(_T("ATTENDEE")) )
{
  xml.FindChildElem( _T("LASTNAME") );
  csaSortValues.Add( xml.GetChildData() );
  cuiaSortIndexes.Add( xml.GetElemIndex() );
}

Here is a simple bubble sort:

// Sort
CString csSwap;
int nSwap;
for ( int nI=0; nI<csaSortValues.GetSize()-1; ++nI )
  for ( int nJ=nI+1; nJ<csaSortValues.GetSize(); ++nJ )
    if ( csaSortValues[nI] > csaSortValues[nJ] )
    {
      csSwap = csaSortValues[nI];
      csaSortValues[nI] = csaSortValues[nJ];
      csaSortValues[nJ] = csSwap;
      nSwap = cuiaSortIndexes[nI];
      cuiaSortIndexes[nI] = cuiaSortIndexes[nJ];
      cuiaSortIndexes[nJ] = nSwap;
    }

Using the sorted array of element indexes, we can generate a document using GetSubDoc to retrieve the records from the original document and AddChildSubDoc to add them to the output document. This will bring forward any attributes and all elements inside each ATTENDEE element.

// Generate a sorted document
xml.ResetPos();
xml.FindElem();
CMarkup xmlSorted;
xmlSorted.AddElem( xml.GetTagName() );
for ( int nS=0; nS<cuiaSortIndexes.GetSize(); ++nS )
{
  xml.GotoElemIndex( cuiaSortIndexes[nS] );
  xmlSorted.AddChildSubDoc( xml.GetSubDoc() );
}

The resulting document xmlSorted looks like this:

<ATTENDANCE>
  <ATTENDEE code="709">
    <LASTNAME>Garvey</LASTNAME>
    <STATE>Mississippi</STATE>
  </ATTENDEE>
  <ATTENDEE code="401">
    <LASTNAME>Hoffman</LASTNAME>
    <STATE>Virginia</STATE>
  </ATTENDEE>
  <ATTENDEE code="961">
    <LASTNAME>Woods</LASTNAME>
    <STATE>Utah</STATE>
  </ATTENDEE>
</ATTENDANCE>

Apart from sorting, it can also be useful to use the index as the data value of an item in a list or tree control (also known as listview or treeview). This allows the CMarkup object to act as a database of information about items in the window. XML is an efficient and flexible way to attach different kinds of information to an item when the types of items within the tree differ from each other.

 

comment posted ElemIndex functions

Itamar Syn-Hershko 13-Jun-2006

It is my understanding that m_iPos holds the current position of the marker in the XML text loaded to CMarkup. If the ElemIndex functions use that pointer, how can it be safe to add a new node before that position? Wouldn't the m_iPos value get increased, since it is now pointing at a few chars ahead?

These indexes work like handles (I might have called them handles, but the use of the term "index" in this class goes back too far). An index value is the same for the memory life of the element until a re-parse of the document which only happens when you do a full SetDoc or Load. When you insert an element anywhere in the document, the new element is assigned the next available index by x_GetFreePos, and this does not change the index of any other element. There is a linked list of ElemPos structures that happens to be stored in something like an array (actually m_aPos is not a plain array). So, when one index integer is less than another, it does not mean it is earlier in the document.