CMarkup XML Parser Performance
Release 11.3 has made a leap in performance (e.g. from 39mb/s to 53mb/s* excluding file I/O), so its a good time to post some data on the speed of CMarkup, and to discuss XML parser performance issues. Here is a comparison of 11.3 with the previous release 11.2; raw parsing goes from 40000 to 54000 bytes per millisecond and attribute parsing (the basis for attribute methods) goes from 5000 to 9000 b/ms (see also Attribute Method Performance).
|Release||Chart||parse doc/attrib||create doc/attrib||Units|
Since these measurements do not involve disk I/O, the speeds are measured in character units per millisecond where the character unit is b for byte, w for word (2 bytes), and dw for double word (4 bytes), depending on the build and platform. In the first chart I include 2 parse tests and then 2 corresponding create tests.
|parse document||this is the core indicator of parsing speed; the document string is passed to SetDoc in memory and parsed, it is not loading the document from disk|
|parse attributes||loops through the document reading all attributes with GetAttribName and GetAttrib (the new GetNthAttrib method is more efficient way to do this)|
|create document||builds a document using an AddElem and SetAttrib for each element, the document is not saved to disk, there is no disk I/O in this measurement|
|create attributes||creates a document with up to 4 randomly selected attributes and values per element, the SetAttrib call occassionally overwrites an attribute|
The reason for release 11.3 performance improvement
One of the most intensively used operations in the parser is determining whether a character is one of a set of characters. In 11.3 I replaced
strchr) with a lookup define which is an order of magnitude faster and yields a roughly 30% speed improvement in overall raw parser speed. The new lookup define only checks the bounds and then returns the offset in the array, where
c is the character,
l are the bounds (first and last) and
s is the lookup array (a string):
#define x_ISONEOF(c,f,l,s) ((c>=f&&c<=l)?(int)(s[c-f]):0)
So, for example, a whitespace check uses
x_ISONEOF and passes the bounds 9 and 32, and a lookup string array for the range between those bounds:
// classic whitespace " \t\n\r" #define x_ISWHITESPACE(c) x_ISONEOF(c,9,32, "\2\3\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1")
Another roughly 5% overall improvement was gained by replacing
strncmp) with a simple speedy implementation of string compare.
Comparing different builds of release 11.3
|Build||Chart||parse doc/attrib||create doc/attrib||Units|
|MSXML6 MFC MBCS||3832||1762||1849||1347||b/ms|
|MSXML6 MFC WCHAR||3950||1939||1963||1428||w/ms|
Using Unicode (either UTF-8 or WCHAR) strings in memory is much more efficient than MBCS which utilizes Windows APIs to determine character boundaries according to the locale character set. MSXML is very slow due to the overhead of COM and is slightly faster in a WCHAR build which avoids conversion to and from COM's WCHAR-based strings.
File mode performance
Unlike the measurements above, the XML reader and XML writer measurements are all in bytes per millisecond regardless of build because they are based on the file I/O rather than the in-memory character unit size. The file is UTF-8, which means the MBCS and wide character builds have the extra penalty of character set conversion. The MBCS conversion can be done using the libc (stdlib.h) function
wctomb (not using the Windows API).
|Build||Chart||XML reader||XML writer||Units|
|MFC MBCS libc||2231||2844||b/ms|
|STL MBCS libc||2155||2677||b/ms|
* Measurements here are representative of the speed with my own sample data on a 1.7GHz 1GB Vista netbook. Running these tests twice in a row often gets slighly different results because they are affected by variations in CPU.