Transformation Example: Apples to Oranges

It is truly impressive that some developers have mastered a challenging technology such as XSLT which makes any moderately complex task nearly impossible. But is it really worth it? It is funny to me when someone being helpful on Microsoft XSL newsgroup produces a big long stylesheet solution that appears absolutely cryptic.

Below is a 55 line transformation example taken from a post called how to keep a running count without variable reassigment. As is typical with XSLT, the developer posing the question is having trouble implementing something that is very simple in familiar procedural languages like C++. He just wants to increment an integer for every line of output. It often turns out that there is a way to do it in XSLT, though it might strike you as non-intuitive. I provide this example in order to show how using CMarkup produces a much faster, cleaner and more straight-forward solution to the same problem (see also Transformation Using CMarkup).

Buma writes: the inability of variable reassignment to keep a running count in xsl is giving me fits, can someone take the following xml:

<columns>
    <column>
        <col>apple</col>
        <col>orange</col>
        <col>banana</col>
    </column>
    <column>
        <col>car</col>
        <col>train</col>
        <col>boat</col>
    </column>
    <column>
        <col>a</col>
        <col>b</col>
        <col>c</col>
    </column>
</columns>

and produce the following output?

1 apple car a 
2 apple car b 
3 apple car c 
4 apple train a 
5 apple train b 
6 apple train c 
7 apple boat a 
8 apple boat b 
9 apple boat c 
10 orange car a 
11 orange car b 
12 orange car c 
13 orange train a 
14 orange train b 
15 orange train c 
16 orange boat a 
17 orange boat b 
18 orange boat c 
19 banana car a 
20 banana car b 
21 banana car c 
22 banana train a 
23 banana train b 
24 banana train c 
25 banana boat a 
26 banana boat b 
27 banana boat c

Dimitre Novatchev responds: It is quite simple, if the transformation is performed in two passes with the results being numbered in the second pass. Below is a one-pass solution:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt"
    exclude-result-prefixes="xsl msxsl">
  <xsl:output omit-xml-declaration="yes"/>
  <xsl:variable name="vrtfnextCombinations">
    <xsl:for-each select="*/*">
      <n>
        <xsl:call-template name="numCombinations">
          <xsl:with-param name="pCurGroup" select="."/>
        </xsl:call-template>
        <xsl:text>&#xA;</xsl:text>
      </n>
    </xsl:for-each>
  </xsl:variable>
  <xsl:variable name="vnextCombinations"
    select="msxsl:node-set($vrtfnextCombinations)/*"/>
  <xsl:template match="/">
    <xsl:call-template name="combineSiblings">
      <xsl:with-param name="pCurGroup" select="*/*[1]"/>
    </xsl:call-template>
  </xsl:template>
  <xsl:template name="combineSiblings">
    <xsl:param name="pCurGroup" select="/.."/>
    <xsl:param name="pCurCombination"/>
    <xsl:param name="pNum" select="1"/>
    <xsl:choose>
      <xsl:when test="$pCurGroup">
        <xsl:for-each select="$pCurGroup/*">
          <xsl:variable name="vcurPos" select="position()"/>
          <xsl:call-template name="combineSiblings">
            <xsl:with-param name="pCurGroup"
               select="$pCurGroup/following-sibling::*[1]"/>
            <xsl:with-param name="pCurCombination"
               select="concat($pCurCombination, ., ' ')"/>
            <xsl:with-param name="pNum"
               select="$pNum
                     +
                       $vnextCombinations[count($pCurGroup/preceding-sibling::*)+1]
                     *
                       ($vcurPos - 1)"/>
          </xsl:call-template>
        </xsl:for-each>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="concat($pNum, ' ', $pCurCombination)"/>
        <xsl:text>&#xA;</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  <xsl:template name="numCombinations">
    <xsl:param name="pCurGroup" select="/.."/>
    <xsl:choose>
      <xsl:when test="not($pCurGroup/following-sibling::*[1])">1</xsl:when>
      <xsl:otherwise>
        <xsl:variable name="vNextCombinations">
          <xsl:call-template name="numCombinations">
            <xsl:with-param name="pCurGroup"
              select="$pCurGroup/following-sibling::*[1]"/>
          </xsl:call-template>
        </xsl:variable>
        <xsl:value-of select="count($pCurGroup/*) * $vNextCombinations"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

If the sight of this does not make you want to avoid XSLT (Why To Avoid XSLT), consider also that it is an order of magnitude slower than the CMarkup solution I give below. Plus, while the XSL works on the given example, it produced invalid line numbers when I added more column and col elements. It appears that it depends on the number of column elements being the same as the number of col elements in each to get the line numbering right. Finding the bug in that stylesheet is your homework if you want to expend the effort :). To be fair, Dimitre said there was a two pass solution that would be quite simple.

Unlike in the error-prone XSL above, incrementing the line count is easy in C++. The complexity is in iterating all of the combinations, but in CMarkup we are free to implement the combinations efficiently, rather than being restricted to the pseudo-functions in XSLT. Also, the CMarkup solution is shorter and easier to grasp than the stylesheet.

int nCount = 0;
xml.FindElem();
xml.IntoElem();
RecurseCols( xml, csResult, nCount, "" );

And this function:

void RecurseCols( CMarkup& xml, CString& csResult, int& nCount, CString csRowUpToThis )
{
  while ( xml.FindChildElem( ) ) // for each col
  {
    CString csRowNow = csRowUpToThis;
    csRowNow += xml.GetChildData() + " "; // col element value
    int nIndex = xml.GetChildElemIndex(); // remember col position
    if ( xml.FindElem() ) // another column?
      RecurseCols( xml, csResult, nCount, csRowNow ); // run against all combos
    else
    {
      CString csOutput;
      csOutput.Format( "%d %s\r\n", ++nCount, csRowNow );
      csResult += csOutput;
    }
    xml.GotoChildElemIndex( nIndex ); // restore col position
  }
}

I compared the two solutions using CMarkup for my C++ solution and CMarkupMSXML to run the stylesheet. Below is the code I used to run the two transformations and then compare the two string results.

// Transform with MSXML plus style sheet
CMarkupMSXML msxml;
msxml.Load( "c:\\temp\\test.xml" );
CMarkupMSXML msxmlStylesheet;
msxmlStylesheet.Load( "c:\\temp\\test.xsl" );
CString csResultXSL = (LPCTSTR)msxml.m_pDOMDoc->transformNode(
    msxmlStylesheet.m_pDOMDoc );

// Transform with CMarkup
CMarkup xml;
xml.Load( "c:\\temp\\test.xml" );
int nCount = 0;
CString csRow;
CSmartStr ssResult( 1024 );
xml.FindElem();
xml.IntoElem();
RecurseCols( xml, ssResult, nCount, csRow );

// Compare results (they are identical)
BOOL bIdentical = (ssResult.csStr == csResultXSL);

To compare performance I created a version of the XML data containing 6 column elements with 6 col elements in each yielding a transformation result of over a megabyte of text. I wrote a little class called CSmartStr (shown below) to use instead of a plain CString csResult for better concatenation performance in generating a large string result. So I exchanged the second argument of RecurseCols with CSmartStr& ssResult and used ssResult instead of csResult. The XSLT took over 20 seconds while the CMarkup function took less than a second. Thinking it might depend on the MSXML version, I tried MSXML 4.0 with #define MARKUP_MSXML4 but it made no significant difference.

struct CSmartStr
{
  CSmartStr( int nStartLen ) { nStrLen = nStartLen; csStr.GetBuffer(nStrLen); };
  void operator+=( const CString& a )
  {
    if ( csStr.GetLength() + a.GetLength() > nStrLen )
    {
      nStrLen = nStrLen * 2 + a.GetLength();
      csStr.GetBuffer( nStrLen );
    }
    csStr += a;
  };
  int nStrLen;
  CString csStr;
};

The CSmartStr class is a tiny wrapper for a CString which increases the buffer size with extra room as you append to reduce expensive reallocations. This is handy when you might be concatenating a very large string by little bits and pieces since you don't want CString to reallocate and copy its buffer each time you append.

Well, we did not transform apples to oranges, but may be I was comparing apples to oranges (declarative vs procedural method, or XSLT vs CMarkup). Anyway, the simple point is to consider ways other than XSLT to do transformation; you may be glad you did.

comment posted I have a sub-second XSLT solution

Drazen Dotlic 29-Dec-2005

I've reimplemented the problem and tried it with Microsoft's .NET 2.0 compiled XSLT processor. It is very fast. If you know XSLT, I'll let you be the judge of the readability of my solution. Your C++ interface still looks cool, but it's not an end-all solution to all things XML/XSLT (you can also see another recent post where I go into more detail why this is).

Drazen wrote a good response to me in Quick, which is better suited for XML transformation: C++ or XSLT? It is nicely presented, but I've got to say it ultimately only makes the point here stronger. It is significant that he uses "EXSLT" to get the power function he needed. Though he tried to preemptively defend this decision, it is an example of how you are at the mercy of your XSLT component to implement an extension, and it stops my MXSML 4.0 test from working. His new XSL is half the lines and if I remove the power function (and ignore the incorrect line numbers) it runs as fast as my C++ example. But it looks like the line numbering in his solution still depends on a certain number of elements as I gather from the 6 in math:power(6, count($recurse)), whereas my C++ example can have any number of col elements in any number of column elements. Sure he can fix that too, but the fact that someone skilled with XSLT can improve this particular stylesheet does not prove the value of XSLT for this problem.

Developing in C++, using CMarkup for XML navigation and creation, is a versatile and scalable way to do transformation. CMarkup does not wrap some handy capabilities like sorting, and on the face of it this may seem like a disadvantage. Then again many of the handy features of XSLT allow you to get started quickly but then hamper your progress on that one additional thing that is not so simple in XSLT. For example, there are many different ways of sorting different types of data which is a process that it must ultimately be very costly for XSLT to preside over if it does indeed provide a mechanism for extending its sorting functionality.

This is why I began this article referring to "moderately complex tasks" which is where investing in XSLT has serious risks. On that note I am going to refer this discussion to the bottom of Why To Avoid XSLT.

comment posted A few more variations

Drazen Dotlic 10-Jan-2006

I have hopefully addressed some of your concerns in a follow-up to my original post where I present another 2 ways of doing the same thing and discuss a bit more. It might be interesting for your readers to have another (not so different) perspective.

At the end of More on XSLT vs C++ for XML transformation Drazen provided a 2 pass solution which works in my MSXML 4.0 test, is a simpler stylesheet, and appears to be the same speed. It is still a bit of a leap to get into the declarative mind frame but you can see the two templates: the first one takes the result set from the second one and puts line numbers at the beginning of the lines. Drazen concludes:

...the mental shift required for an average developer when going from C++ to XSLT might indeed be too high. If you can't figure XSLT out, by all means use CMarkup or a similar library. But don't do it just because XSLT is different - there's value there too.

Actually I think most people assume you need to use XSLT to do transformation so the purpose of this article is to provide an example with CMarkup to demonstrate the potential of C++ based transformation. If you are already using CMarkup or have the option of using it I certainly would recommend against going the XSLT route for transformation because of the XSLT's deployment complexities (stylesheet, component availability and version).