Attribute Method Performance

Attribute parsing performance came up several times this year, and some significant improvements were made in CMarkup release 11.3.

In its attribute methods, in every call CMarkup reparses attributes up to the one that is accessed. This can lead to poorer than expected performance when you have attribute intensive code, i.e. code that repeatedly accesses or checks for many attributes. This is due to an original design trade off: CMarkup does not store attribute indexes.

 

comment posted CMarkup - Attribute Query Speed

Cameron Dunn 23-Jun-2010

I've been very impressed with the speed of loading and parsing. However, I've hit one area which is surprisingly slow which I wanted to ask you about - XML attributes.

I'm loading about 3000 XML files, for a total of 99468934 bytes. I'm loading in the files myself and then passing the string to CMarkup. If I do that, and then loop down into every element in every file, it takes about 2 seconds (specifically, 1985ms), which I thought was pretty impressive.

However, if I do the same thing but also loop over every attribute on every element, it takes 8 seconds. I found this a bit surprising - obviously there's no additional file IO time or anything like that, it's all in string processing. The interface which CMarkup provides to access attributes is very string heavy - you need to get the attribute by name and then query the value using this name.

Is there a quicker way to loop over the XML attributes? I need the name and value for each attribute, but they can be in the order in which they occur in the file.


...I iterate the attributes for a single element with GetAttribName() and then call GetAttrib() to get their values.

CMarkup release 11.3 introduces a new method GetNthAttrib which is twice as efficient as GetAttribName combined with GetAttrib, and in addition attribute parsing is about twice as fast (see CMarkup XML Parser Performance). So, iterating the attributes in your case might be reduced from 6 seconds to 1.5 seconds.

I did design a solution to manage and reuse attribute indexes for the current element, but it was actually slower for a single attribute access and wasn't really fast enough to justify the added complexity. Another option would be to include attributes much like elements in CMarkup indexing, but I think that's too fundamental at this point. So I've chosen to remain with the original reparse design for the time being, and hopefully the 11.3 performance boost and new method will help out enough.

If you have intensive use of attributes, in some cases you might want to extract them with GetNthAttrib to an external map as a more efficient machanism to access them repeatedly. You can even map them in a separate CMarkup object as elements using SavePos and then RestorePos to do the lookup.