Simple Merge Example

CMarkup is great for merging, splitting and transforming XML documents. Many of these processes involve a Depth First Traversal of an XML document which simply means running through all of the elements in the order they would appear in an editor. For a merge you would generally traverse one document while looking for corresponding elements in the other. In the simplest case of a merge, element names are unique among siblings such as in documents that hold configuration information where each piece of configuration information is in a descriptive tag name. The following document has Licensed and DefaultContact elements, and let's assume that we know there will only be one or none of each under the root element.

<Settings>
  <Licensed>1</Licensed>
  <DefaultContact>
    <Name type="char" len="10"></Name>
    <Field type="date" len="20"></Field>
  </DefaultContact>
</Settings>

The purpose of merging is often to bring updates from one document into a master document, adding new information and overriding existing information. If the above document is the master, the document below is an update document containing a new Title element as well as DefaultContact information meant to override the existing information in the master document.

<Settings>
  <DefaultContact status="ready">
    <Field type="date" len="5"></Field>
    <Origin len="80"/>
  </DefaultContact>
  <Title>Maintenance</Title>
</Settings>

The goal in merging the update into the master document is to add new information and update existing information from the update document into the master document. The DefaultContact exists in both documents so we only want to set the attributes, if any, from this element of the update document into the DefaultContact element in the master document, before looking inside. The status attribute is set in the master document. Inside DefaultContact, the update document has a Field element with a different len attribute value which is set in the corresponding Field element of the master document. The Origin element is a new element that is added to the master document. Outside of the DefaultContact element, the Title element is new and should be inserted into the master document. The result of the merge should be:

<Settings>
  <Licensed>1</Licensed>
  <DefaultContact status="ready">
    <Name type="char" len="10"></Name>
    <Field type="date" len="5"></Field>
    <Origin len="80"/>
  </DefaultContact>
  <Title>Maintenance</Title>
</Settings>

In this type of situation you can use the following generic merge function (this is an MFC implementation) to merge an update document (xmlUpdate) into a master document (xmlMaster) where element names are unique among siblings and there is no mixed content. Since the elements are uniquely identified among siblings by their tag name, it is okay for sibling elements to be in a different order in the two documents. This function works by removing elements and attributes from the update document as they are added or overrided in the master document.

void SimpleMerge( CMarkup& xmlMaster, CMarkup& xmlUpdate )
{
  // Generic merge when element names are unique among siblings
  // removing xmlUpdate elements as added/overrided in xmlMaster
  //
  CString csMergeName;
  xmlMaster.ResetPos();
  xmlUpdate.ResetPos();
  BOOL bMergeFinished = FALSE;
  if ( ! xmlMaster.FindChildElem() )
  {
    xmlMaster = xmlUpdate;
    bMergeFinished = TRUE;
  }
  xmlUpdate.FindChildElem();
  while ( ! bMergeFinished )
  {
    // Process Element
    xmlMaster.IntoElem();
    xmlUpdate.IntoElem();
    csMergeName = xmlMaster.GetTagName();

    // Did this one match?
    xmlUpdate.ResetMainPos();
    BOOL bMatched = xmlUpdate.FindElem( csMergeName );
    if ( bMatched )
    {
      // Merge attributes
      for ( int nAttrib=0; !(csMergeName=
          xmlUpdate.GetAttribName(nAttrib)).IsEmpty();
          ++nAttrib )
        xmlMaster.SetAttrib( csMergeName,
          xmlUpdate.GetAttrib(csMergeName) );
    }

    // Next element (depth first)
    BOOL bChildFound = xmlMaster.FindChildElem();
    while ( ! bChildFound && ! bMergeFinished )
    {
      if ( bMatched )
      {
        while ( xmlUpdate.FindChildElem() )
        {
          xmlMaster.AddChildSubDoc(
            xmlUpdate.GetChildSubDoc() );
          xmlUpdate.RemoveChildElem();
        }
        xmlUpdate.RemoveElem();
      }
      if ( xmlMaster.OutOfElem() )
      {
        xmlUpdate.OutOfElem();
        bChildFound = xmlMaster.FindChildElem();
        if ( ! bChildFound )
        {
          bMatched = TRUE;
          xmlUpdate.ResetChildPos();
        }
      }
      else
        bMergeFinished = TRUE;
    }
  }
}

Merging two documents is rarely implemented by a generic algorithm. It usually involves some special actions for peculiarities of the documents involved and depending on the purpose of the merge. Usually there are repeated sibling elements (e.g. a list of books) of the same tag name, and the only way to know which ones correspond between documents is to match them by a specific attribute value. This requires specialized algorithm based on the circumstances. In the following example document, the bookid attribute would be used to match corresponding book elements in the master and update documents.

<booklist>
  <book bookid="B02">
    <name>Jig, clog, and breakdown dancing</name>
    <price>10.98</price>
  </book>
  <book bookid="C90">
    <name>Clog dancing made easy</name>
    <price>8.99</price>
  </book>
  <book bookid="T11">
    <name>The tango and other up-to-date dances</name>
    <price>19.00</price>
  </book>
</booklist>

You could modify the SimpleMerge function to support this document by replacing the code that says BOOL bMatched = xmlUpdate.FindElem( csMergeName ); with code that first checks if the csMergeName is "book" in which case it searches for a corresponding bookid attribute in a book element. There are different ways of implementing this more or less efficiently and it is left for another example.