CMarkup XML Parser Performance

Release 11.3 has made a leap in performance (e.g. from 39mb/s to 53mb/s* excluding file I/O), so its a good time to post some data on the speed of CMarkup, and to discuss XML parser performance issues. Here is a comparison of 11.3 with the previous release 11.2; raw parsing goes from 40000 to 54000 bytes per millisecond and attribute parsing (the basis for attribute methods) goes from 5000 to 9000 b/ms (see also Attribute Method Performance).

Release Chart parse doc/attrib create doc/attrib Units
CMarkup 11.2 40002 5175 12331 4754 b/ms
CMarkup 11.3 54042 9195 14394 6820 b/ms

Since these measurements do not involve disk I/O, the speeds are measured in character units per millisecond where the character unit is b for byte, w for word (2 bytes), and dw for double word (4 bytes), depending on the build and platform. In the first chart I include 2 parse tests and then 2 corresponding create tests.

parse document this is the core indicator of parsing speed; the document string is passed to SetDoc in memory and parsed, it is not loading the document from disk
parse attributes loops through the document reading all attributes with GetAttribName and GetAttrib (the new GetNthAttrib method is more efficient way to do this)
create document builds a document using an AddElem and SetAttrib for each element, the document is not saved to disk, there is no disk I/O in this measurement
create attributes creates a document with up to 4 randomly selected attributes and values per element, the SetAttrib call occassionally overwrites an attribute

The reason for release 11.3 performance improvement

One of the most intensively used operations in the parser is determining whether a character is one of a set of characters. In 11.3 I replaced MCD_PSZCHR (strchr) with a lookup define which is an order of magnitude faster and yields a roughly 30% speed improvement in overall raw parser speed. The new lookup define only checks the bounds and then returns the offset in the array, where c is the character, f and l are the bounds (first and last) and s is the lookup array (a string):

#define x_ISONEOF(c,f,l,s) ((c>=f&&c<=l)?(int)(s[c-f]):0)

So, for example, a whitespace check uses x_ISONEOF and passes the bounds 9 and 32, and a lookup string array for the range between those bounds:

// classic whitespace " \t\n\r"
#define x_ISWHITESPACE(c) x_ISONEOF(c,9,32,
  "\2\3\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1")

Another roughly 5% overall improvement was gained by replacing MCD_PSZNCMP (strncmp) with a simple speedy implementation of string compare.

Comparing different builds of release 11.3

Build configuration makes a big difference in performance. See ANSI and Unicode files and C++ strings and non-Unicode text handling in CMarkup for discussions of string character set options.

Build Chart parse doc/attrib create doc/attrib Units
MFC (UTF-8) 54042 9195 14394 6820 b/ms
STL (UTF-8) 55923 9193 11583 6061 b/ms
MFC MBCS 14424 3269 11084 3492 b/ms
STL MBCS 14783 3137 8636 3223 b/ms
MSXML6 MFC MBCS 3832 1762 1849 1347 b/ms
MFC WCHAR 57405 8607 14530 6594 w/ms
STL WCHAR 57780 8607 10744 5639 w/ms
MSXML6 MFC WCHAR 3950 1939 1963 1428 w/ms

Using Unicode (either UTF-8 or WCHAR) strings in memory is much more efficient than MBCS which utilizes Windows APIs to determine character boundaries according to the locale character set. MSXML is very slow due to the overhead of COM and is slightly faster in a WCHAR build which avoids conversion to and from COM's WCHAR-based strings.

File mode performance

Unlike the measurements above, the XML reader and XML writer measurements are all in bytes per millisecond regardless of build because they are based on the file I/O rather than the in-memory character unit size. The file is UTF-8, which means the MBCS and wide character builds have the extra penalty of character set conversion. The MBCS conversion can be done using the libc (stdlib.h) function wctomb (not using the Windows API).

Build Chart XML reader XML writer Units
MFC 15086 11528 b/ms
STL 13858 9540 b/ms
MFC WCHAR 10854 8757 b/ms
STL WCHAR 10717 7509 b/ms
MFC MBCS 11673 9846 b/ms
STL MBCS 10444 8137 b/ms
MFC MBCS libc 2231 2844 b/ms
STL MBCS libc 2155 2677 b/ms

See also:
Archived CMarkup Performance Tests
Attribute Method Performance

* Measurements here are representative of the speed with my own sample data on a 1.7GHz 1GB Vista netbook. Running these tests twice in a row often gets slighly different results because they are affected by variations in CPU.