How to generate file names with XML splitter script

An example of how to use the free firstobject XML editor to split XML and then name the output files based on information in the pieces separated by the XML splitter script. Maybe this will be useful to other NGO's who need to split their XML.

 

comment posted XML Splitter

Dita Ciulacu 01-Jul-2009

I am desperately searching for a xml splitter to generate the file name using values from a child field. Is there any way to have the file named this way:

xmlOutput.Open( "test" + "_" + [Child value from REFERRAL_ID] + "_" + nFileCount + ".xml", MDF_WRITEFILE );

My xml [not real data] is:

<REFERRAL_DISCHARGE>
  <FILE_VERSION>1.0</FILE_VERSION>
  <REFERRAL_ID>1234</REFERRAL_ID>
  <ORGANISATION_ID>ORG-5678</ORGANISATION_ID>
  <ORGANISATION_TYPE>005</ORGANISATION_TYPE>
  <EXTRACT_FROM_DATE_TIME>2009-06-01T00:00:00</EXTRACT_FROM_DATE_TIME>
  <EXTRACTED_DATE_TIME>2009-06-30T15:40:15</EXTRACTED_DATE_TIME>
  <TEAM_CODE>5555</TEAM_CODE>
  <EVENT_HCU_ID>XXX1234</EVENT_HCU_ID>
  <SEX>M</SEX>
  <DATE_OF_BIRTH>1900-05-05</DATE_OF_BIRTH>
  <REFERRAL_FROM>UN</REFERRAL_FROM>
  <START_DATE_TIME>2008-12-24T00:00:00</START_DATE_TIME>
</REFERRAL_DISCHARGE>

The parent is REFERRAL_DISCHARGE, I need the file name exactly how you have it plus the individual value from REFERRAL_ID to make it easy to link to the data included.

We are a not-for-profit organization and we have to report to the [New Zealand] Ministry of Health and our data is to be packed as individual xml files. We are not dealing with huge files (this one was only 316kb) and also they are relatively simple extracts, but I don’t know in the future... it may get more complicated.

For splitting an XML file less than 10MB into a lot of referral discharge files, this is the easiest way to do it:

split()
{
  CMarkup xmlInput, xmlSubDoc;
  xmlInput.Load( "input.xml" );
  int nFileCount = 0;
  while ( xmlInput.FindElem("//REFERRAL_DISCHARGE") )
  {
    ++nFileCount;
    xmlSubDoc.SetDoc( xmlInput.GetSubDoc() );
    str sID = xmlSubDoc.FindGetData( "//REFERRAL_ID" );
    str sFilename = "test_" + sID + "_"+ nFileCount + ".xml";
    WriteTextFile( sFilename, xmlSubDoc.GetDoc() );
  }
  return nFileCount;
}

Splitting a huge file

For others who have really large files (especially over 100MB up to any number of gigabytes) use the XML reader mode which processes the source file on disk very efficiently. The only difference from the above script is opening the input file in read mode rather than loading it all into memory.

split()
{
  CMarkup xmlInput, xmlSubDoc;
  xmlInput.Open( "input.xml", MDF_READFILE );
  int nFileCount = 0;
  while ( xmlInput.FindElem("//REFERRAL_DISCHARGE") )
  {
    ++nFileCount;
    xmlSubDoc.SetDoc( xmlInput.GetSubDoc() );
    str sID = xmlSubDoc.FindGetData( "//REFERRAL_ID" );
    str sFilename = "test_" + sID + "_"+ nFileCount + ".xml";
    WriteTextFile( sFilename, xmlSubDoc.GetDoc() );
  }
  xmlInput.Close();
  return nFileCount;
}

A note about usage of the anywhere path. If you want to grab multiple pieces of data like xmlSubDoc.FindGetData("//REFERRAL_ID") remember that the // anywhere path starts from the current position. So if you're not sure about the order of the data you are grabbing, call xmlSubDoc.ResetPos() in between calls to FindGetData.

See also:

Split XML with XML editor script

Split XML file into smaller pieces

Video of XML splitter script for splitting XML files

C++ XML reader parses a very large XML file

CMarkup Open Method - file read mode

Parse huge XML file in C++