CMarkup XML Parser Performance

Release 11.3 has made a leap in performance (e.g. from 39mb/s to 53mb/s* excluding file I/O), so its a good time to post some data on the speed of CMarkup, and to discuss XML parser performance issues. Here is a comparison of 11.3 with the previous release 11.2; raw parsing goes from 40000 to 54000 bytes per millisecond and attribute parsing (the basis for attribute methods) goes from 5000 to 9000 b/ms (see also Attribute Method Performance).

Release Chart parse doc/attrib create doc/attrib Units

CMarkup 11.2 40002 5175 12331 4754 b/ms

CMarkup 11.3 54042 9195 14394 6820 b/ms

Release	Chart	parse doc/attrib	create doc/attrib	Units
CMarkup 11.2		40002	5175	12331	4754	b/ms
CMarkup 11.3		54042	9195	14394	6820	b/ms

Since these measurements do not involve disk I/O, the speeds are measured in character units per millisecond where the character unit is b for byte, w for word (2 bytes), and dw for double word (4 bytes), depending on the build and platform. In the first chart I include 2 parse tests and then 2 corresponding create tests.

parse document this is the core indicator of parsing speed; the document string is passed to SetDoc in memory and parsed, it is not loading the document from disk

parse attributes loops through the document reading all attributes with GetAttribName and GetAttrib (the new GetNthAttrib method is more efficient way to do this)

create document builds a document using an AddElem and SetAttrib for each element, the document is not saved to disk, there is no disk I/O in this measurement

create attributes creates a document with up to 4 randomly selected attributes and values per element, the SetAttrib call occassionally overwrites an attribute

The reason for release 11.3 performance improvement

One of the most intensively used operations in the parser is determining whether a character is one of a set of characters. In 11.3 I replaced MCD_PSZCHR (strchr) with a lookup define which is an order of magnitude faster and yields a roughly 30% speed improvement in overall raw parser speed. The new lookup define only checks the bounds and then returns the offset in the array, where c is the character, f and l are the bounds (first and last) and s is the lookup array (a string):

#define x_ISONEOF(c,f,l,s) ((c>=f&&c<=l)?(int)(s[c-f]):0)

So, for example, a whitespace check uses x_ISONEOF and passes the bounds 9 and 32, and a lookup string array for the range between those bounds:

// classic whitespace " \t\n\r"
#define x_ISWHITESPACE(c) x_ISONEOF(c,9,32,
  "\2\3\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1")

Another roughly 5% overall improvement was gained by replacing MCD_PSZNCMP (strncmp) with a simple speedy implementation of string compare.

Comparing different builds of release 11.3

Build configuration makes a big difference in performance. See ANSI and Unicode files and C++ strings and non-Unicode text handling in CMarkup for discussions of string character set options.

Build Chart parse doc/attrib create doc/attrib Units

MFC (UTF-8) 54042 9195 14394 6820 b/ms

STL (UTF-8) 55923 9193 11583 6061 b/ms

MFC MBCS 14424 3269 11084 3492 b/ms

STL MBCS 14783 3137 8636 3223 b/ms

MSXML6 MFC MBCS 3832 1762 1849 1347 b/ms

MFC WCHAR 57405 8607 14530 6594 w/ms

STL WCHAR 57780 8607 10744 5639 w/ms

MSXML6 MFC WCHAR 3950 1939 1963 1428 w/ms

Build	parse doc/attrib	create doc/attrib	Units
MFC (UTF-8)	54042	9195	14394	6820	b/ms
STL (UTF-8)	55923	9193	11583	6061	b/ms
MFC MBCS	14424	3269	11084	3492	b/ms
STL MBCS	14783	3137	8636	3223	b/ms
MSXML6 MFC MBCS	3832	1762	1849	1347	b/ms
MFC WCHAR	57405	8607	14530	6594	w/ms
STL WCHAR	57780	8607	10744	5639	w/ms
MSXML6 MFC WCHAR	3950	1939	1963	1428	w/ms

Using Unicode (either UTF-8 or WCHAR) strings in memory is much more efficient than MBCS which utilizes Windows APIs to determine character boundaries according to the locale character set. MSXML is very slow due to the overhead of COM and is slightly faster in a WCHAR build which avoids conversion to and from COM's WCHAR-based strings.

File mode performance

Unlike the measurements above, the XML reader and XML writer measurements are all in bytes per millisecond regardless of build because they are based on the file I/O rather than the in-memory character unit size. The file is UTF-8, which means the MBCS and wide character builds have the extra penalty of character set conversion. The MBCS conversion can be done using the libc (stdlib.h) function wctomb (not using the Windows API).

Build Chart XML reader XML writer Units

MFC 15086 11528 b/ms

STL 13858 9540 b/ms

MFC WCHAR 10854 8757 b/ms

STL WCHAR 10717 7509 b/ms

MFC MBCS 11673 9846 b/ms

STL MBCS 10444 8137 b/ms

MFC MBCS libc 2231 2844 b/ms

STL MBCS libc 2155 2677 b/ms

Build	XML reader	XML writer	Units
MFC	15086	11528	b/ms
STL	13858	9540	b/ms
MFC WCHAR	10854	8757	b/ms
STL WCHAR	10717	7509	b/ms
MFC MBCS	11673	9846	b/ms
STL MBCS	10444	8137	b/ms
MFC MBCS libc	2231	2844	b/ms
STL MBCS libc	2155	2677	b/ms

* Measurements here are representative of the speed with my own sample data on a 1.7GHz 1GB Vista netbook. Running these tests twice in a row often gets slighly different results because they are affected by variations in CPU.

	parse document	this is the core indicator of parsing speed; the document string is passed to SetDoc in memory and parsed, it is not loading the document from disk
	parse attributes	loops through the document reading all attributes with GetAttribName and GetAttrib (the new GetNthAttrib method is more efficient way to do this)
	create document	builds a document using an AddElem and SetAttrib for each element, the document is not saved to disk, there is no disk I/O in this measurement
	create attributes	creates a document with up to 4 randomly selected attributes and values per element, the SetAttrib call occassionally overwrites an attribute