C++ XML writer creates a very large XML file

Need to create a very large XML file from C++ without holding your entire XML document in memory? Exporting gigabytes of records from a database? Logging or archiving huge amounts of XML data to file? CMarkup file write mode provides a simple high performance C++ XML writer. You open the file and then just add data with the same CMarkup methods you would use to create the document in memory.

CMarkup Developer License

File read and write modes (see C++ XML reader too) are in the developer version of CMarkup.

Even though CMarkup can lightly create multi-megabyte documents in memory and then write them to disk, file write mode lets you write to disk as you create the document to conserve memory. File write mode provides low footprint write-only forward-only push to file to create huge XML files.

How to use CMarkup file write mode

Instead of immediately adding elements to the newly instantiated CMarkup object, just use the Open method to open a file in write mode before adding elements. Here's some C++ source code to show you how it works; the only new methods you need are Open and Close:

CMarkup xmlwriter;
xmlwriter.Open( "inventorydata.xml", MDF_WRITEFILE );
xmlwriter.AddElem( "root" );
xmlwriter.SetAttrib( "infotype", "inventorydata" );
xmlwriter.IntoElem();
std::string strID, strName, strRef;
while ( GetInventoryData(strID,strName,strRef) )
{
  xmlwriter.AddElem( "data" );
  xmlwriter.SetAttrib( "id", strID );
  xmlwriter.IntoElem();
  xmlwriter.AddElem( "name", strName );
  xmlwriter.AddElem( "ref", strRef );
  xmlwriter.OutOfElem();
}
xmlwriter.Close();

C++ XML writer methods

CMarkup's file write mode limits the methods you can use and the ways you can use those methods. The key thing to remember is that it is forward-only push to file so you cannot navigate in the document you are creating; you can only add elements and nodes, and set attributes. And since you can only write in a single position, you cannot use child element methods.

Here are the CMarkup methods that can be used, and a brief explanation of how they work in file write mode:

Open With flag MDF_WRITEFILE, exclusively opens file for write (erasing contents if file exists)
Close Closes file and ends file mode. Automatically invoked by destructor
Flush Only to be used in file write mode, this flushes any partial document in memory (up to the closing tags) and the file stream itself
AddElem In file write mode, adds an element after the current node or element. The added element becomes the current element. You cannot specify a data value if you want to call IntoElem and add elements and nodes inside it
SetAttrib In file write mode, sets the value of the specified attribute of the current element (or processing instruction)
SetData In file write mode, sets the value of the element or node, only works for an element if the element was added without a value
IntoElem In file write mode, goes "into" current element (which must be empty, see AddElem explanation above) to add elements and nodes between its start and end tags
OutOfElem In file write mode, goes "out of" element to add elements and nodes after its end tag, and unlike regular CMarkup usage it sets current position after end tag
AddNode In file write mode, adds a node after the current node or element, the added node becomes the current node
GetElemPath In file write mode, returns a string representing the absolute path of the main position element, allowing for a maximum of 255 uniquely named sibling elements
GetDoc In file write mode, returns the partial document markup string which has not yet been written to file
AddSubDoc Update June 7, 2009: In Release 11.1 file write mode, adds the specified markup string after the current position. If the added subdocument is an element with no child elements, the added element becomes the current position (and you can set attributes), otherwise the current position is after the end of the added subdocument.

You can also use any CMarkup static utility function because these do not involve the CMarkup object state or data members.

Appending

If you are generating a log file without a root element, you can open the file with the MDF_APPENDFILE flag to add data starting at the end of the file if it exists. The Flush method can be used to ensure data has been written to disk when you want to reduce chances of failure in the middle of a logical section of the data; however it will reduce performance if used too often.

A window into the document

In file write mode, the m_strDoc document string member is used as a partial document buffer letting you view the current segment or "write block" of the document in the debugger variables the same way you do when not in file mode, giving great visibility into the document and the behind the scenes positioning in the actual document text. The partial document text holds the markup you create until a MARKUP_FILEBLOCKSIZE-based size is reached and it is converted to the file charset as it is written to file.

Copying a CMarkup object in file mode

The copy constructor and assignment operator = do not work when copying a CMarkup object in either read or write file mode. This is because the CMarkup object encapsulates an open file pointer, a system handle which can only be managed by one CMarkup object at a time.