Split XML with XML editor script

On another forum a user asked how to "Split XML and output to different files" and did not get an answer, just several confusing responses about versions of XSLT. This is how to do it in the free firstobject XML editor:

split_XML()
{
  CMarkup input;
  input.Load( "input.xml" );
  while ( input.FindElem("//npc") )
    WriteTextFile( "npc"+input.GetAttrib("id")+".xml", input.GetSubDoc() );
}

This script will read input.xml and create an npc[ID].xml file for each npc subdocument. This was the question:

Starting with... Desired result

"input.xml"

<root>
<npc id="1">
  <p pid="1"/>
  <p pid="2"/>
  <p pid="3"/>
</npc>
<npc id="2">
  <p pid="3"/>
  <p pid="4"/>
  <p pid="5"/>
</npc>
<npc id="3">
  <p pid="4"/>
  <p pid="5"/>
  <p pid="6"/>
</npc>
</root>
 

"npc1.xml"

<npc id="1">
  <p pid="1"/>
  <p pid="2"/>
  <p pid="3"/>
</npc>

"npc2.xml"

<npc id="2">
  <p pid="3"/>
  <p pid="4"/>
  <p pid="5"/>
</npc>

"npc3.xml"

<npc id="3">
  <p pid="4"/>
  <p pid="5"/>
  <p pid="6"/>
</npc>

Avoid XSLT because it makes this much more tricky.

 

comment posted Memory exception splitting big XML

Angela Baines 15-Jan-2010

Hi I'm trying to use the xml splitter script but I'm getting an out of memory exception when I try to [load] the xml file and the script just exits on the [next] line within the free editor. The file is 862 mb but it does state that the same method can be used for gigabite size files.

For an 862MB file you should use the Open method instead of Load to read the input file. Then you will use very little memory. Note that unless you have an exceptional amount of free memory, you cannot view files that large in the firstobject XML editor.

See examples for huge files in Split XML file into smaller pieces and How to generate file names with XML splitter script.

 

comment posted Open not getting anything

Angela Baines 15-Jan-2010

With the editor and foal script using Open it doesn't seem to get anything when stepping through in debug it shows xmloutput (cmarkup) (0) and xmlinput (cmarkup) (0).

It is not finding the input file. In the path strings use double backslashes in quotes:

xmlInput.Open("C:\\XML files\\testing.xml");

You can also check the result of Open and get an explanation with GetResult or GetError:

t()
{
  CMarkup m;
  if ( ! m.Open("c:\\does_not_exist.txt",MDF_READFILE) )
    return m.GetError();
  return m.GetDoc();
}

The output is:

The system cannot find the file specified.

 

comment posted add static data to each XML piece

Jeff Taylor 10-Mar-2011

Is there any way I can add some static data (like a date or something) to each XML piece?

Say you have a big file with a list of companies and you want to split it into files called Company1.xml, Company2.xml etc. Before writing out each company file you want to add the date to it. First, assign the company subdocument to its own CMarkup object called company and set the attribute in its top element.

CMarkup company = input.GetSubDoc();
company.FindElem(); // Company
company.SetAttrib( "timestamp", sDate );

Here is the script written in such a way that you can run it from the DOS command line specifying the date you want to put into the individual output files.

split_and_set_date(str sDate)
{
  int nCompanyCount = 0;
  CMarkup input;
  if ( ! input.Load("C:\\Companies.xml") )
    return input.GetError();
  while ( input.FindElem("//Company") )
  {
    CMarkup company = input.GetSubDoc();
    company.FindElem(); // Company
    company.SetAttrib( "timestamp", sDate );
    ++nCompanyCount;
    if ( ! company.Save("C:\\Company"+nCompanyCount+".xml") )
      return company.GetError();
  }
  return nCompanyCount;
}

If this script was in C:\split.foal you could run it from the DOS command line as follows (see Using the firstobject XML editor from the command line and make sure you have foxe release 2.4.2).

"C:\Program Files\firstobject\foxe.exe" -run C:\split.foal 20110310T121500

A deeper look into splitting XML

<Companies>
  <Company id="56A" zone="A">
    ...
  </Company>
  <Company id="62B" zone="B">
    ...
  </Company>
</Companies>

You could do all sorts of things with the company subdocument before writing it to file, even extract a value used to name the output file. Also, see the top of the script where sDate is passed into the function; you could pass in the input path and output base to which N.xml will be appended, and even some criteria (like a zone) to control which companies get output. The following does all this, and uses a company ID attribute to name the output file.

split_and_set_date(str sDate, str sZone, str sInPath, str sOutPath)
{
  int nCompanyCount = 0, nSelectionCount = 0;
  CMarkup input;
  if ( ! input.Load(sInPath) )
    return input.GetError();
  while ( input.FindElem("//Company") )
  {
    ++nCompanyCount;
    CMarkup company = input.GetSubDoc();
    company.FindElem();
    if ( company.GetAttrib("zone") == sZone ) // e.g. zone "A"
    {
      ++nSelectionCount;
      company.SetAttrib( "timestamp", sDate );
      str sID = company.GetAttrib( "id" ); // e.g. "56A"
      if ( ! company.Save(sOutPath+sID+".xml") ) // e.g. C:\Company56A.xml
        return company.GetError();
    }
  }
  return "selected " + nSelectionCount + "/" + nCompanyCount;
}

This could be called like this:

foxe.exe -run C:\split.foal 20110310T121500 A C:\Companies.xml C:\Company

See also:

Split and Merge Translation XML

Using the firstobject XML editor from the command line

Split XML file into smaller pieces

Video of XML splitter script for splitting XML files

C++ XML reader parses a very large XML file

Parse huge XML file in C++

When CMarkup Load Returns false