Checking the Top of the File

 

comment posted Checking top of XML file

Juan Carlos Cobas 24-Feb-2003

Suppose I have a very large file (e.g. > 10 MB) and I just want to read the root element tag to make sure that the file is actually a file created by my application. So far, I'm reading all the file into the memory but I guess this is not very efficient as I'm only interested in the root element ... Could you please advise me about how to read the root element without reading the whole file?

Update March 24, 2009: With the developer version of CMarkup release 11.0 you can use file read mode (see C++ XML reader) which only reads as much of the file as you access:

bool bProcessThisFile = false;
CMarkup xml;
xml.Open( "10MB.xml", CMarkup::MDF_READFILE );
xml.FindElem();
if ( xml.GetTagName() == "TheOneIWant" )
  bProcessThisFile = true;
xml.Close();

You can also use GetAttrib to interrogate attributes on the root element. You can even process part or all of the file this way only reading the document up to the point you need to. File read mode lets you use the same methods you use to extract information without file read mode but with forward-only navigation.

Without CMarkup's file read mode, you could read the first lines or 100 bytes of the file into a string, and diagnose it using string functions. In MFC, something like:

unsigned char szCheckBuffer[101];
CFile file( lpszPathName, CFile::modeRead );
nFileLen = file.GetLength();
nCheckLen = min( 100, nFileLen );
nCheckLen = file.Read( szCheckBuffer, nCheckLen );
szCheckBuffer[nCheckLen] = '\0';
if ( ! strstr( szCheckBuffer, "<MyRootElement" ) )
{
  file.Close();
  return FALSE;
}
// If you want to continue reading file here...
pBuffer = new unsigned char[nFileLen + 1];
memcpy( pBuffer, szCheckBuffer, nCheckLen );
nCheckLen = file.Read( &pBuffer[nCheckLen], nFileLen-nCheckLen );
file.Close();
pBuffer[nFileLen] = '\0';
CMarkup xml( (LPCSTR)pBuffer );
...