Log Files Without A Root Element

Due to the requirement of a single root element, well-formed XML documents are not well designed for the purpose of applications where XML records are appended to the end of a file. However, there is so much demand for this that many home grown work-arounds have been invented. CMarkup release 8.0 takes a step towards helping out by providing support for XML files without root elements.

Most XML tools and viewers other than CMarkup 8.0 do not directly support log files like this one:

<RECORD><NAME>John Smith</NAME><ID>7632</ID><SCORE>10.6</SCORE></RECORD>
<RECORD><NAME>Jane Smith</NAME><ID>1741</ID><SCORE>11.2</SCORE></RECORD>

This log file contains two records which in themselves are well-formed, but together they are not a well-formed document because the parser will discover the sibling RECORD after the first one without a containing root element. As of release 8.0, CMarkup can parse this file and be used to create it, and all of the regular methods will work as expected, while IsWellFormed will return false because there is a sibling element to the root. Here is sample code to load the log and loop through the records.

xml.Load( "scores.log" );
bool bIsDocWellFormed = xml.IsWellFormed(); // false
while ( xml.FindElem() )
{
   xml.FindChildElem( "NAME" );
   CString csName = xml.GetChildData();
   xml.FindChildElem( "ID" );
   CString csID = xml.GetChildData();
   xml.FindChildElem( "SCORE" );
   double dScore = atof(xml.GetChildData());
}

Although the log file contains XML, I use a filename extension other than xml because strictly speaking it is not well-formed XML. The following code generates two additional records for appending to the log file.

CMarkup xml;
xml.AddElem( "RECORD" );
xml.AddChildElem( "NAME", "Joe Smith" );
xml.AddChildElem( "ID", "3298" );
xml.AddChildElem( "SCORE", "5.7" );
xml.AddElem( "RECORD" );
xml.AddChildElem( "NAME", "Jen Smith" );
xml.AddChildElem( "ID", "9008" );
xml.AddChildElem( "SCORE", "12.0" );
bool bIsDocWellFormed = xml.IsWellFormed(); // false

External Entity File

The issues around efficent implementation of XML log files are numerous, but this article focuses on the inconvenient requirement that XML must have a single root element. While CMarkup 8.0 can be used with documents that do not have a root element, you still have interoperability issues with other XML producers and consumers. In February 2004 (before CMarkup release 8.0), Scott Wilson described his requirement for appending events to a file.

comment posted External data

Scott Wilson 10-Feb-2004

Here's the scenario. The data are EEG signals acquired over the course of hours or days. One file contains the actual signals themselves -- this is written in binary and simply appended to as data is collected. Other information collected are "events" that may be entered by a nurse or technician at specific times to describe what is happening. Some events are generated automatically by detection algorithms and can be quite dense, many per second. Historically, the events have been written to a binary file as well by appending them.

The goal is to write the events, etc., to an XML file in standard format to facilitate file exchange.

What I've described is the acquisition system. There will also be networked monitoring systems that display the EEG (signal+events+...). As such, it would be much faster if the complete XML document (which may become quite large) did not have to be reread and parsed every 1 s by the monitoring programs. I ran across the !ENTITY directive while browsing the user groups and thought this would be a solution to the problem. The events could be appended to a distinct file as they were found (not a well-formed XML doc). The monitor would know that it only had to read from where it left off. (Note that there are a number of "growing" lists besides just the events.)

Since strict XML tools won't read that log file, Scott pointed out a newsgroup post Appending to XML file and keeping XML well-formed by Marrow of MarrowSoft which describes a way of using an external entity. You have a log file that does not have a root element (or an XML Declaration), and then you have a well-formed stub file with a root element that includes the log file into it by way of an external entity. Here is what the XML stub file might look like:

<!DOCTYPE staticinc [ <!ENTITY logentries SYSTEM "scores.log"> ]> 
<root> 
&logentries; 
</root>

The first line of this file is the DTD (Document Type Definition) which defines the external entity called logentries and specifies the file that it is in. Down inside the root element is the entity reference &logentries; which is to be substituted with the content of the file specified in the DTD. The root element can have any name, it doesn't have to be "root".

CMarkup will not perform the external entity include from the XML stub file because CMarkup ignores the DTD. If you want to use the XML stub file with CMarkup you have to substitute the entity reference yourself. The following code sample is not a generic solution, it simply replaces the content of the XML stub file's root element with the content of the log file.

CMarkup xmlLog, xmlStub;
xmlLog.Load( "scores.log" );
xmlStub.Load( "stub.xml" );
xmlStub.FindElem();
xmlStub.SetElemContent( xmlLog.GetDoc() );

There are other ways of doing the entity reference substitution, like finding and replacing the entity reference itself rather than just replacing the content of the root element. Anyway, this XML stub file is not necessary with CMarkup since you can use the log file directly; this example was given just to show that you can do this with CMarkup if need be.

comment posted XML Fragments

Scott Wilson 10-Feb-2004

An aside on non well-formed xml. To me, the requirement of having a single tag at the document root has always seemed arbitrary at best. I use CMarkup to support XML persistence, where each class knows how to stream itself into a CString. If you require that each xml fragment be well-formed, then you end up with more elements than you need. As such, I have wrapper code that inserts/strips the root tag to keep CMarkup happy. No doubt there may be reasons for this I don't understand, but it seems like there are cases where it is nice to handle xml fragments.

The above comment was written before CMarkup 8.0. With release 8.0 you do not need to insert and strip the root tag anymore.

comment posted 2nd root element

Ong Wen Jian 21-Sep-2012

I have a XML file with the sample data below:

<PLAY>
    <GAME ID = "1"/>
    <player> Player A </player>
</PLAY>

<PLAY>
   <GAME ID = "2"/>
   <player> Player B </player>
</PLAY>

How can I write a code such that FindElem() can search and get the data from the 2nd PLAY root element after it gets the data from the 1st PLAY element?

The short answer: OutOfElem. Here is a program that accesses each element:

sMarkup =
"<PLAY>\n"
"    <GAME ID = \"1\"/>\n"
"    <player> Player A </player>\n"
"</PLAY>\n"
"<PLAY>\n"
"   <GAME ID = \"2\"/>\n"
"   <player> Player B </player>\n"
"</PLAY>\n";
m.SetDoc(sMarkup);
m.FindElem(); // first PLAY (no current child element)
m.IntoElem(); // inside PLAY, before GAME
m.FindElem(); // GAME
m.FindElem(); // player
m.OutOfElem(); // first PLAY (current child element is player)
m.FindElem(); // second PLAY (no current child element)
m.IntoElem(); // inside second PLAY, before GAME
m.FindElem(); // GAME under second PLAY
m.FindElem(); // player under second PLAY
m.OutOfElem(); // second PLAY (current child element is player under second PLAY)

You can grab the GAME ID and player under each PLAY as follows:

m.ResetPos(); // top of document, before first PLAY
while (m.FindElem("PLAY"))
{
  m.IntoElem();
  m.FindElem("GAME");
  strId = m.GetAttrib("ID");
  m.ResetMainPos(); // in case player element comes before GAME
  m.FindElem("player");
  strPlayer = m.GetData();
  m.OutOfElem();
  // ... do something with strId and strPlayer
}

By the way, this document has multiple root elements which means it is not proper XML as far as the official XML standard is concerned. I just call it markup; there is nothing wrong with having data this way unless you need to interoperate with other programs that do not have this flexibility.