Export XML records with matching childset

Conventional wisdom has you importing and exporting XML to and from a database in order to run queries and utilize data that is in XML. But with firstobject's free XML editor you can perform all sorts of operations rapidly and efficiently directly on the XML document. This example shows how to export subsets of records, query, tally and modify XML records in a real estate database XML file.

 

comment posted export records with matching childset

Eddie Wrenn 25-Jan-2010

What I have is a list of properties for sale nationwide, contained in a 1.5gb XML file (your program is the only one which seems to handle this with ease!) I'm looking for a way to make the editor export all the records which have a matching childset, in this case 'locality' (in this example, London). There's 100,000 listings so not a manual job!

I've been successful splitting the file into 100,000 seperate files, named by the locality (using your tutorials). But patching them all together takes a long time, even if I automate it. A sample record below:

<listing key="1234567" status="active" updated="20090101T010101" type="residence">
  <title><![CDATA[Xyz Street, London]]></title>
  <supplementary-url><![CDATA[1234567.htm]]></supplementary-url>
  <description><![CDATA[AVAILABLE 01/01/2010. This
beautifully decorated place is situated on a quiet back
street of Xyz Garden in the heart of Xyz London.
The owners have refurbished to a particluarly high
standard paying exceptional attention to detail to the
overall finish and decoration. As the apartment is
situated on the Nth floor there are great views of
London giving the apartment excellent natural light.
Features available. We highly recommend a viewing.]]></description>
  <residence type="flat">
    <bedrooms><![CDATA[1]]></bedrooms>
    <bathrooms><![CDATA[1]]></bathrooms>
    <reception><![CDATA[yes]]></reception>
  </residence>
  <authority>
    <lease currency="GBP" term="private" visible="yes">
      <price term="weekly"><![CDATA[450]]></price>
    </lease>
  </authority>
  <address visible="yes">
    <country><![CDATA[GB]]></country>
    <subdivision><![CDATA[London]]></subdivision>
    <locality><![CDATA[London]]></locality>
    <postcode><![CDATA[AA1A 1AA]]></postcode>
    <road><![CDATA[Xyz Street]]></road>
  </address>
  <attachments>
    <photo title="" updated="20090101T010101" type="image/jpeg">
      <uri><![CDATA[1234567_354_255.jpg]]></uri>
    </photo>
    <photo title="" updated="20091118T201049" type="image/jpeg">
      <uri><![CDATA[23456789_354_255.jpg]]></uri>
    </photo>
    <photo title="" updated="20091118T201049" type="image/jpeg">
      <uri><![CDATA[34567890_354_255.jpg]]></uri>
    </photo>
  </attachments>
  <vendor>
    <name><![CDATA[Xyz Property Services]]></name>
    <phone><![CDATA[020 1234 5678]]></phone>
    <email><![CDATA[enquiries@xyz.example]]></email>
  </vendor>
</listing>

To find all the matching records on a huge file you do something like this: from the File menu select New Program, paste in the following script, and modify the input file pathname (note that for C++ syntax, use a double backslash for backslashes in the pathname).

pull_by_locality()
{
  str strSearch = "London";
  CMarkup xmlInput, xmlListing, xmlOutput;
  xmlInput.Open( "C:\\huge.xml", MDF_READFILE );
  while ( xmlInput.FindElem("//listing") )
  {
    xmlListing.SetDoc( xmlInput.GetSubDoc() );
    if ( xmlListing.FindGetData("//locality") == strSearch )
      xmlOutput.AddSubDoc( xmlListing.GetDoc() );
  }
  return xmlOutput.GetDoc();
}

To export the result document as London.xml:

xmlOutput.Save( strSearch + ".xml" );

To delete (or actually skip) records which are no longer required e.g. we want them if status is "active" but not if it is "inactive" or "sold":

xmlListing.ResetPos();
if ( xmlListing.FindGetData("//status") != "active" )
  ...

To change an element tag name from title to topicname in the output, first add the new element with the same content, then remove the old one (this is the easiest way to make sure the new element goes into the same position as the removed one).

xmlListing.ResetPos();
if (xmlListing.FindElem("//title"))
{
  xmlListing.AddElem("topicname", xmlListing.GetData());
  xmlListing.FindPrevElem(); // title
  xmlListing.RemoveElem();
}

As far as inputing the search string, FOAL scripts don't support dialogs yet. However, you can automate the process if you can put the search string in a file such as search.txt which could be retrieved in the FOAL script with:

str s;
if ( ReadTextFile("C:\\search.txt", s) && StrLength(s) > 2 )
  s = StrMid( s, 0, StrLength(s)-2 ); // remove CRLF
str strSearch = s;

In DOS, if you had a script named search.foal, then you could create a search.bat file as follows to let you type search London on the command line.

echo %1 > C:\search.txt
"C:\Program Files\firstobject\foxe.exe" -run C:\search.foal

Here's an interesting diagnostic to count instances of each locality:

locality_tally()
{
  CMarkup xmlLocalities, xmlInput;
  xmlInput.Open( "huge.xml", MDF_READFILE );
  while ( xmlInput.FindElem("//locality") )
  {
    str sLoc = xmlInput.GetData();
    int n = 1;
    if ( xmlLocalities.RestorePos(sLoc) )
      n = StrToInt(xmlLocalities.GetAttrib("n")) + 1;
    else
    {
      xmlLocalities.ResetPos();
      xmlLocalities.AddElem("locality",sLoc);
      xmlLocalities.SavePos(sLoc);
    }
    xmlLocalities.SetAttrib("n", n);
  }
  return xmlLocalities;
}

Would yield a result like this:

<locality n="890">London</locality>
<locality n="431">Yorkshire</locality>

 

comment posted how to clear the XML result

Eddie Wrenn 27-Jan-2010

Now I'm piggybacking "searches" on top of each other, so it will search for London, output them into a London file, then search for Yorkshire, and output that into a Yorkshire file. My problem is that the editor [script] will retain the results for London, and add them to the top of my Yorkshire file - is there a little code that will clear the internal memory before starting the next process?

xmlOutput.SetDoc("");

 

comment posted records that contain the State of Michigan

Grace 03-Feb-2014

I am new to XML and need some instructions. I have the same problem that one of your previous customers inquired about on your website to "Export XML records with matching childset." I am trying to get all the records that contain the State of Michigan on a separate file but it does not appear to be working for me. Below is a example of a record:

<Property>
<Description><![CDATA[Great Location at corner of Xyz.
Large older home, very charming! Note: Tenants pay 1/nth
of gas and electric. Water included.]]></Description>
<MinRent>1300</MinRent>
<MaxRent>1300</MaxRent>
<MarketingName/>
<Address>301 N test St</Address>
<City>Little Town</City>
<State>MI</State>
<Zip>49876</Zip>
<YearBuilt>0</YearBuilt>
<NumberUnits>7</NumberUnits>
<Latitude>12.3456789</Latitude>
<Longitude>-12.3456789</Longitude>
<AcceptsHcv>False</AcceptsHcv>
<PhoneNumber>(123) 456-7890</PhoneNumber>
<LastUpdated>1/30/2014 8:00:00 AM</LastUpdated>
<Amenity AmenityID="101" AmenityName="Parking"/>
<Amenity AmenityID="102" AmenityName="Unfurnished"/>
<Amenity AmenityID="103" AmenityName="Dishwasher"/>
<Amenity AmenityID="109" AmenityName="Garbage Disposal"/>
</Property>

The file is pretty big (225 MB).

Although for 225MB it is probably not necessary, the following script is written to handle extremely large input and output files by opening them in "file mode" (using MDF_READFILE for the input and MDF_WRITEFILE for the output). The "anywhere path" //Property is used to search the input document for Property records, and //State searches anywhere in the xmlRecord subdocument for the State. In the output window it shows you the count of records it searched and the number matched. If it shows 0 searched it is because your input does not contain any Property elements.

pull_by_State()
{
  str strSearch = "MI";
  int s = 0;
  int m = 0;
  CMarkup xmlInput, xmlRecord, xmlOutput;
  if (!xmlOutput.Open("C:\\test_" + strSearch + ".xml", MDF_WRITEFILE))
    return xmlOutput.GetResult();
  xmlOutput.AddElem("Search");
  xmlOutput.SetAttrib("criteria", strSearch);
  xmlOutput.IntoElem();
  if (!xmlInput.Open("C:\\test.xml", MDF_READFILE))
    return xmlInput.GetResult();
  while ( xmlInput.FindElem("//Property") )
  {
    ++s;
    xmlRecord.SetDoc( xmlInput.GetSubDoc() );
    if ( xmlRecord.FindGetData("//State") == strSearch )
    {
      xmlOutput.AddSubDoc(xmlRecord.GetDoc());
      ++m;
    }
  }
  xmlInput.Close();
  xmlOutput.Close();
  return "Searched " + s + " records, matched " + m;
}

See also:

Using the firstobject XML editor from the command line

Counting XML tag names and values with foal

firstobject Access Language

Format XML, indent align beautify clean up XML

Simple XML editor meets memory stick

Split XML with XML editor script

Tree customization in the firstobject XML editor

Video demo of editing RSS XML in the tree view of the free firstobject XML editor

Video of XML Editor format XML, customize treeview, and program

Split XML file into smaller pieces

Video of XML splitter script for splitting XML files

C++ XML reader parses a very large XML file

Parse huge XML file in C++