<rss version="2.0">
<channel>
<title>News from firstobject.com</title>
<link>http://www.firstobject.com/dn_news.xml</link>
<description>News from firstobject.com updated when articles are posted</description>
<language>en-us</language>
<lastBuildDate>Sat, 23 Apr 2011 17:05:00 GMT</lastBuildDate>
<ttl>180</ttl>
<image>
<title>News from firstobject.com</title>
<width>142</width>
<height>18</height>
<link>http://www.firstobject.com/</link>
<url>http://www.firstobject.com/firstobjectNews.gif</url>
</image>
<item>
<title>CMarkup 11.5 Release Notes</title>
<link>http://www.firstobject.com/cmarkup-11.5-release-notes.htm</link>
<guid isPermaLink="false">cmarkup-11.5-release-notes.htm</guid>
<pubDate>Sat, 23 Apr 2011 17:05:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Release 11.5 Date: April 23, 2011, <a href="http://www.firstobject.com/dn_markup.htm">download</a></p>

<p>Fixes for whitespace trimming/collapsing, and file read mode, as well as some changes in compiler <code><font color=blue>#ifdef</font></code> handling for <code>WIN32</code>.</p>

<h4>Summary:</h4>

<ul>
<li>QT Windows compiling</li>
<li>end-of-line options</li>
<li>fix: trim whitespace escaped chars</li>
<li>fix: 64-bit compiler warnings for <code>MCD_BLDLEN</code></li>
<li>fix: (<a href="http://www.firstobject.com/dn_markdev.htm">Dev</a> only) file read mode bug</li>
</ul>

<h4>Details:</h4>

<p>Using CMarkup with QT on Windows is easier now; you shouldn't have to do any tweaking of CMarkup to add it to your QT project. The changes involved special cases for GNUC when it is used on Windows. Either the <code>WIN32</code> (Windows.h) or <code>_WIN32</code> (Visual Studio) precompiler defines will let CMarkup know it is compiling for Windows.</p>
<p>In 11.3 and 11.4, <code>MDF_TRIMWHITESPACE</code> and <code>MDF_COLLAPSEWHITESPACE</code> would remove an escaped char at the end of the trimmed data (see <a href="http://www.firstobject.com/dn_markknown.htm#20110312224000">11.3 Bug: trim whitespace removes escaped value</a>).</p>
<p>On Linux and OS X, lines generated by CMarkup will now end with a newline by default instead of a Windows style CRLF (carriage return line feed), and the end-of-line setting can now be directed with preprocessor definitions. If you are on a non-Windows platform and want your CRLFs back, now you must add <code>MARKUP_EOL_CRLF</code> to your preprocessor definitions.</p>

<p><table class="shadedtable" width=100%>
<tr><th colspan=3>End Of Line Defines</th></tr>
<tr><th>Name</th><th width=100>Value</th><th>Description</th></tr>
<tr>
  <td valign=top><code>MARKUP_EOL_CRLF</code></td>
  <td valign=top><code>MCD_T("\r\n")</code></code></td>
  <td valign=top>Aka <code>0d 0a</code>, this is the default for Windows builds</td>
</tr>
<tr>
  <td valign=top><code>MARKUP_EOL_NEWLINE</code></td>
  <td valign=top><code>MCD_T("\n")</code></code></td>
  <td valign=top>Aka <code>0a</code>, this is now the default for non-Windows builds</td>
</tr>
<tr>
  <td valign=top><code>MARKUP_EOL_RETURN</code></td>
  <td valign=top><code>MCD_T("\r")</code></code></td>
  <td valign=top>Aka <code>0d</code>, this is rarely used</td>
</tr>
<tr>
  <td valign=top><code>MARKUP_EOL_NONE</code></td>
  <td valign=top><code>MCD_T("")</code></code></td>
  <td valign=top>For minimal size, documents will be on one line, but <a class="codelink" href="http://www.firstobject.com/dn_markGetDocFormatted.htm">GetDocFormatted</a> will not produce desired results</td>
</tr>
</table></p>

<table width=100% cellspacing=0 cellpadding=5><tr><td valign=top bgcolor=fafae2 width=30>
<p><a href="http://www.firstobject.com/dn_markdev.htm"><img border=0 src="http://www.firstobject.com/cmarkupdev.gif" alt="CMarkup Developer License"></a></p></td><td bgcolor=fafae2>
<p>The file read mode bug fix only affects <a href="http://www.firstobject.com/dn_markdev.htm">CMarkup Developer</a> and the <a href="http://www.firstobject.com/dn_editor.htm">free XML editor </a>&nbsp;<a href="http://www.firstobject.com/dn_foal.htm">FOAL C++ scripting</a></p>
</td></tr></table>

<p>The file read mode bug occurred with elements over 32k long that did not have child elements. This problem was only in file read mode where the <a class="codelink" href="http://www.firstobject.com/dn_markOpen.htm">Open</a> method is used with <code>MDF_READFILE</code>. See <a href="http://www.firstobject.com/split-xml-file-into-smaller-pieces.htm#20110407231100">File read GetSubDoc incomplete</a>.</p>

<p>See also previous CMarkup release notes: <A href="http://www.firstobject.com/cmarkup-11.4-release-notes.htm">11.4</A>, <A href="http://www.firstobject.com/cmarkup-11.3-release-notes.htm">11.3</A>, <A href="http://www.firstobject.com/cmarkup-11.2-release-notes.htm">11.2</A>, <A href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">11.1</A>, <A href="http://www.firstobject.com/cmarkup-11.0-release-notes.htm">11.0</A>, <A href="http://www.firstobject.com/cmarkup-10.1-release-notes.htm">10.1</A>, <A href="http://www.firstobject.com/cmarkup-10.0-release-notes.htm">10.0</A>, <A href="http://www.firstobject.com/dn_markrel.htm">Archived CMarkup Release Notes</A></p>


]]></description>
</item>
<item>
<title>Archived CMarkup 11.4 Release Notes</title>
<link>http://www.firstobject.com/cmarkup-11.4-release-notes.htm</link>
<guid isPermaLink="false">cmarkup-11.4-release-notes.htm</guid>
<pubDate>Sat, 05 Feb 2011 23:31:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Release 11.4 Date: February 5, 2011, <a href="http://www.firstobject.com/dn_markup.htm">download</a></p>

<p>An important fix to the 11.3 whitespace features, improvements in the <code>GetDocFormatted</code> method, and an enhancement to <code>HasAttrib</code>.</p>

<p>Here's the list of 11.4 enhancements:</p>

<ul>
<li>fix: <code>MDF_TRIMWHITESPACE</code> and <code>MDF_COLLAPSEWHITESPACE</code> were <a href="http://www.firstobject.com/dn_markknown.htm#20101216035000">crashing on values that were only whitespace</a>... sorry, a glaring hole in the test cases</li>
<li><a class="codelink" href="http://www.firstobject.com/dn_markGetDocFormatted.htm">GetDocFormatted</a> now removes any space between the attribute name and value (and has improvements in speed and memory efficiency)</li>
<li><a class="codelink" href="http://www.firstobject.com/dn_markHasAttrib.htm">HasAttrib</a> can now return the attribute value as well, making it an alternative to <a class="codelink" href="http://www.firstobject.com/dn_markGetAttrib.htm">GetAttrib</a> for convenience and performance</li>
<li>fix: rare HTML parser case for attributes without values in empty start tags, e.g. <code>&lt;A a/&gt;</code></li>
</ul>

<p>See also previous CMarkup release notes: <A href="http://www.firstobject.com/cmarkup-11.3-release-notes.htm">11.3</A>,  <A href="http://www.firstobject.com/cmarkup-11.2-release-notes.htm">11.2</A>, <A href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">11.1</A>, <A href="http://www.firstobject.com/cmarkup-11.0-release-notes.htm">11.0</A>, <A href="http://www.firstobject.com/cmarkup-10.1-release-notes.htm">10.1</A>, <A href="http://www.firstobject.com/cmarkup-10.0-release-notes.htm">10.0</A>, <A href="http://www.firstobject.com/dn_markrel.htm">Archived CMarkup Release Notes</A></p>


]]></description>
</item>
<item>
<title>Archived CMarkup 11.3 Release Notes</title>
<link>http://www.firstobject.com/cmarkup-11.3-release-notes.htm</link>
<guid isPermaLink="false">cmarkup-11.3-release-notes.htm</guid>
<pubDate>Sat, 20 Nov 2010 23:15:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p><img align=right src="http://www.firstobject.com/speed.jpg"/></p>

<p>Release 11.3 Date: November 20, 2010, <a href="http://www.firstobject.com/dn_markup.htm">download</a></p>

<p>A performance improvement makes CMarkup significantly faster! Overall parsing speed is up 35%, and attribute methods are twice as fast as 11.2. This release also includes document flags to trim whitespace and collapse whitespace.</p>

<p>Here's the list of 11.3 enhancements:</p>

<ul>
<li>overall parser performance increased about 35%, see <a href="http://www.firstobject.com/cmarkup-xml-parser-performance.htm">CMarkup XML Parser Performance</a></li>
<li><a href="http://www.firstobject.com/attribute-method-performance.htm">attribute method performance improvements</a></li>
<li>New method <a class="codelink" href="http://www.firstobject.com/dn_markGetNthAttrib.htm">GetNthAttrib</a> retrieves name and value of attribute 0, 1, 2...</li>
<li>use <code>MDF_TRIMWHITESPACE</code> and <code>MDF_COLLAPSEWHITESPACE</code> to affect retrieved values (see <a href="http://www.firstobject.com/whitespace-and-cmarkup.htm">Whitespace and CMarkup</a>)</li>
<li>fix: bug in Linux/OS X <code>TextEncoding::IConv</code> <i>*thanks Frank Dering</i></li>
<li>performance measures added to the tests and CMarkupTesting.xml output (see <a href="http://www.firstobject.com/cmarkup-test-dialog.htm">CMarkup test dialog</a>)</li>
</ul>

<p>See also previous CMarkup release notes: <A href="http://www.firstobject.com/cmarkup-11.2-release-notes.htm">11.2</A>, <A href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">11.1</A>, <A href="http://www.firstobject.com/cmarkup-11.0-release-notes.htm">11.0</A>, <A href="http://www.firstobject.com/cmarkup-10.1-release-notes.htm">10.1</A>, <A href="http://www.firstobject.com/cmarkup-10.0-release-notes.htm">10.0</A>, <A href="http://www.firstobject.com/dn_markrel.htm">Archived CMarkup Release Notes</A></p>


]]></description>
</item>
<item>
<title>Using the firstobject XML editor from the command line</title>
<link>http://www.firstobject.com/xml-editor-command-line.htm</link>
<guid isPermaLink="false">xml-editor-command-line.htm</guid>
<pubDate>Sat, 20 Nov 2010 23:12:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>The <a href="http://www.firstobject.com/dn_editor.htm">free firstobject XML editor</a> has some command line switches. Here is the summary; details are below.</p>

<p><table class="shadedtable">
<tr><th width=250 align=left>Switch</th><th align=left>Purpose</th></tr>
<tr><td><code>-new</code></td><td>open in a new instance of the editor (useful when single instance preference is selected)</td></tr>
<tr><td><code>-same</code></td><td>open in the existing instance of the editor</td></tr>
<tr><td><code>-watch "C:\event.log"</code></td><td>open file in read-only auto-reloading mode to view the tail of log files</td></tr>
<tr><td><code>-line 23</code></td><td>open file at a line</td></tr>
<tr><td><code>-offset 451</code></td><td>UTF-8 offset from beginning of document or from beginning of line if line is specified</td></tr>
<tr><td><code>-fromoffset 5</code></td><td>pre-select text from this offset to specified offset</td></tr>
<tr><td><code>-run script.foal:f arg1 arg2</code></td><td>execute the script without showing the editor</td></tr>
</table></p>

<p>Examples:</p>

<pre>foxe.exe "C:\XML examples\file.xml" -line 6</pre>
<pre>foxe.exe file.xml -line 6 -offset 5</pre>
<pre>foxe.exe -offset 240 -fromoffset 235 file.xml</pre>
<pre>foxe.exe -watch C:\event.log</pre>
<pre>foxe.exe -run C:\script.foal</pre>
<pre>foxe.exe -same file.xml</pre>
<pre>"C:\Program Files\firstobject\foxe.exe" -new file.xml</pre>

<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> How to run script automatically</p></div>
<div class=commentposted><p>Angela Baines 18-Jan-2010</p></div>
<div class=commentcontent>
<p>The foal script works a treat now. When I move this to production it has to run as part of an automated project that will be scheduled to run overnight</p>
</div></div>

<p>As of <a href="http://www.firstobject.com/dn_editcomments.htm#20100612220000">release 2.4.1</a> the free firstobject XML editor has a command line switch to run a script:</p>

<pre>foxe -run "C:\foal scripts\script.foal"</pre>

<p>It generates 2 files, foxe_err.txt and foxe_out.txt. The err file contains marked up information about the run and can help diagnose issues with running. The out file contains the output returned from the script in the return statement.</p>

<p>And with <a href="http://www.firstobject.com/dn_editcomments.htm#20110423170200">release 2.4.2</a> you can specify function and arguments, see below.</p>

<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> Run from command line problem</p></div>
<div class=commentposted><p>Garth Lancaster 19-Jul-2010</p></div>
<div class=commentcontent>
<p>In foxe I can do this [with a foal script]</p>
<pre><font color=blue>str</font> NavigateIterativelyXYZ_Generated( CMarkup mDocToNavigate )
{
  mDocToNavigate.ResetPos();
  <font color=blue>str</font> sXML = mDocToNavigate.GetDocFormatted(0);
  <font color=blue>return</font> sXML;
}</pre>
<p>so when I run it, a window pops up and asks which document I wish to convert, shows me the output, all is good... except now, I wish to do this from a command line.</p>
<pre>foxe –run align0.foal input.xml output.xml</pre>
<p>Where align0.foal contains the foal program, input and output names will likely come from a batch script % parameter or such, and input.xml will contain the streamed xml and output.xml will contain the results of the <code>GetDocFormatted(0)</code>. I'm thinking foal can be very powerful, am I missing a point in its implementation?</p>
</div></div>

<p><b>Update April 23, 2011:</b> With <a href="http://www.firstobject.com/dn_editcomments.htm#20110423170200">release 2.4.2</a>, it works the way it should (the way Garth described). The -run command line option passes any number of command line arguments into the parameters of the function in the foal script. For example (remember to use quotes if a path or argument contains spaces):</p>

<pre>foxe -run "C:\foal scripts\script.foal" C:\in.xml C:\out.xml</pre>

<p>Here is a script to format XML that works with the previous command line:</p>

<pre>formatxml(<font color=blue>str</font> sInPath, <font color=blue>str</font> sOutPath)
{
  CMarkup m, r;
  r.AddElem( "load", sInPath );
  m.Load( sInPath );
  r.AddSubDoc( m.GetResult() );
  <font color=blue>if</font> ( ! m.SetDoc(m.GetDocFormatted()) )
    r.AddSubDoc( m.GetResult() );
  r.AddElem( "save", sOutPath );
  m.Save(sOutPath);
  r.AddSubDoc( m.GetResult() );
  <font color=blue>return</font> r;
}</pre>

<p>Bascially this script just loads the input file, calls <a class="codelink" href="http://www.firstobject.com/dn_markGetDocFormatted.htm">GetDocFormatted</a> and saves to the output file (you could also add an argument to pass format flags). In addition it returns the results in case something goes awry you can see what happened in foxe_out.txt. If there is no problem, it might look like this:</p>

<PRE lang=xml><FONT color=#0000ff>&lt;load&gt;</FONT><FONT style='color:black;font-weight:bold;'>C:\in.xml</FONT><FONT color=#0000ff>&lt;/load&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;read</FONT><FONT color=#be3232> encoding</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>UTF-8</FONT><FONT color=#0000ff>"</FONT><FONT color=#be3232> length</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>31579</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;save&gt;</FONT><FONT style='color:black;font-weight:bold;'>C:\out.xml</FONT><FONT color=#0000ff>&lt;/save&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;write</FONT><FONT color=#be3232> encoding</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>UTF-8</FONT><FONT color=#0000ff>"</FONT><FONT color=#be3232> length</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>29958</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT></PRE>

<p>If there are multiple functions in a script, you can name your entry function <code>main</code> to avoid confusion about which function is being called. You can also specify the entry function explicitly on the command line with <code>:function</code> at the end of the script filename. Specifying the function allows you to invoke multiple functions in one script from the command line. Here is an example of using the same script to perform different operations:</p>

<pre>foxe -run C:\script.foal<font style='color:#be3232;background:#ffff00;'>:extract</font> London</pre>

<pre>foxe -run C:\script.foal<font style='color:#be3232;background:#ffff00;'>:merge</font> London "New York"</pre>

<p>If you specify a function name that is not found in the script, there will be an indication at the bottom of foxe_err.txt that also indicates the entry point it would have used if you had not specified one:</p>

<PRE lang=xml><FONT color=#0000ff>&lt;entry_point</FONT><FONT color=#be3232> arg_count</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>2</FONT><FONT color=#0000ff>"</FONT> <font style='color:#be3232;background:#ffff00;'>not_found</font><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>gormatxml</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>formatxml</FONT><FONT color=#0000ff>&lt;/entry_point&gt;</FONT></PRE>

<p>If you do not specify the function and there is no main function in the script, the last function with a matching number of arguments is called. It chooses the last matching function because functions earlier in the script tend to be subroutines since in FOAL you can only call functions below where they are defined.</p)

<?php
dn_showbottom($settings);
?>
]]></description>
</item>
<item>
<title>Whitespace and CMarkup</title>
<link>http://www.firstobject.com/whitespace-and-cmarkup.htm</link>
<guid isPermaLink="false">whitespace-and-cmarkup.htm</guid>
<pubDate>Sat, 20 Nov 2010 23:11:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> trimming white space</p></div>
<div class=commentposted><p>Marc Dyksterhouse 17-Mar-2010</p></div>
<div class=commentcontent>
<p>Is there a way to have <a class="codelink" href="http://www.firstobject.com/dn_markGetData.htm">GetData</a> or some other call return just the text of an element and not the whitespace around it?  For example, can <code>GetData</code> return <code>"text"</code> in the following XML instead of <code>"&nbsp;&nbsp;text\n"</code>?</p>

<pre lang=xml><font color=#0000ff>&lt;item&gt;</font><font style='color:black;font-weight:bold;'>
 text
</font><font color=#0000ff>&lt;/item&gt;</font><font style='color:black;font-weight:bold;'>
</font></pre>

<p>I know I can just trim the returned string, but since whitespace isn't supposed to be pertinent in XML, I just thought the library should work this way. In the few cases where I need to preserve whitespace, I can use a CDATA encoding.</p>
</div></div>

<p>With <a href="http://www.firstobject.com/cmarkup-11.3-release-notes.htm">release 11.3</a> you can set flags to trim whitespace or collapse whitespace when reading values from the document. CMarkup is unusual among XML tools because it simply preserves all whitespace, but now it can also support standard ways that XML and HTML processors alter whitespace.</p>

<p>Whitespace includes spaces, tabs, returns and newline characters. CMarkup has always preserved the whitespace as it appears in the document, and it still will. These new flags give you the option of reading the trimmed or collapsed text values, but the document is not altered, so you can turn off the flags and go back to reading the preserved whitespace.</p>

<p><table class="shadedtable" cellpadding=2><tr>
<th width=20>Document Flag</th>
<th width=200>Purpose</th>
</tr><tr>
<td><code>MDF_TRIMWHITESPACE</code></td>
<td>removes leading and trailing whitespace</td>
</tr><tr>
<td><code>MDF_COLLAPSEWHITESPACE</code></td>
<td>removes leading and trailing whitespace, but also replaces all segments of whitespace inside the text with a single space; so for example a newline and tab within the text will become a single space</td>
</tr></table></p>

<p>These flags affect CMarkup methods like <a class="codelink" href="http://www.firstobject.com/dn_markGetData.htm">GetData</a> and <a class="codelink" href="http://www.firstobject.com/dn_markGetAttrib.htm">GetAttrib</a> that retrieve element data, text nodes, and attributes (but not methods like <a class="codelink" href="http://www.firstobject.com/dn_markGetSubDoc.htm">GetSubDoc</a> and <a class="codelink" href="http://www.firstobject.com/dn_markGetElemContent.htm">GetElemContent</a> that return XML i.e. markup text).</p>

<p>These flags have no effect on text retrieved from CDATA Sections. With CMarkup you can create elements to contain CData Section text to protect the whitespace from ever being altered by CMarkup or any other XML tool:</p>

<pre>xml.AddElem( "Prose", strProseText, CMarkup::MNF_WITHCDATA );</pre>

<p>Turn the whitespace flags on and off anytime without performance penalty if for example you want to trim some values and not others. Use <a class="codelink" href="http://www.firstobject.com/dn_markSetDocFlags.htm">SetDocFlags</a> to set these flags.</p>

<pre>CMarkup m;
m.SetDocFlags( CMarkup::MDF_TRIMWHITESPACE );</pre>

<p>You can OR a flag with <a class="codelink" href="http://www.firstobject.com/dn_markGetDocFlags.htm">GetDocFlags</a> if you don't want to affect other flags:</p>

<pre>m.SetDocFlags( m.GetDocFlags() <font style='color:#be3232;background:#ffff00;'>|</font> CMarkup::MDF_COLLAPSEWHITESPACE );</pre>

<p>Turn off a flag without affecting others as follows:</p>

<pre>m.SetDocFlags( m.GetDocFlags() <font style='color:#be3232;background:#ffff00;'>& ~</font>CMarkup::MDF_COLLAPSEWHITESPACE );</pre>

<p>These whitespace flags can affect values returned by <a class="codelink" href="http://www.firstobject.com/dn_markGetData.htm">GetData</a>, <a class="codelink" href="http://www.firstobject.com/dn_markGetAttrib.htm">GetAttrib</a> and related methods. They also affect methods like <a class="codelink" href="http://www.firstobject.com/dn_markFindElem.htm">FindElem</a> that search for a path specifying a value in a path attribute predicate (see <a href="http://www.firstobject.com/dn_markpath.htm">Paths In CMarkup</a>) because values from the document will be trimmed or collapsed before being compared to the specified value.</p>

<p>See also:<br>
<a href="http://www.firstobject.com/dn_marknodes.htm">Node Methods in CMarkup</a><br>


]]></description>
</item>
<item>
<title>How to generate file names with XML splitter script</title>
<link>http://www.firstobject.com/generate-file-names-xml-splitter.htm</link>
<guid isPermaLink="false">generate-file-names-xml-splitter.htm</guid>
<pubDate>Sat, 20 Nov 2010 23:10:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>An example of how to use the <a href="http://www.firstobject.com/dn_editor.htm">free firstobject XML editor</a> to split XML and then name the output files based on information in the pieces separated by the XML splitter script. Maybe this will be useful to other NGO's who need to split their XML.</p>

<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> XML Splitter</p></div>
<div class=commentposted><p>Dita Ciulacu 01-Jul-2009</p></div>
<div class=commentcontent>
<p>I am desperately searching for a xml splitter to generate the file name using values from a child field. Is there any way to have the file named this way:</p>
<p><code>xmlOutput.Open( "test" + "_" + [Child value from REFERRAL_ID]  + "_" + nFileCount + ".xml", MDF_WRITEFILE );</code></p>
<p>My xml [not real data] is:</p>

<PRE lang=xml><FONT color=#0000ff>&lt;REFERRAL_DISCHARGE&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;FILE_VERSION&gt;</FONT><FONT style='color:black;font-weight:bold;'>1.0</FONT><FONT color=#0000ff>&lt;/FILE_VERSION&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;REFERRAL_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>1234</FONT><FONT color=#0000ff>&lt;/REFERRAL_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;ORGANISATION_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>ORG-5678</FONT><FONT color=#0000ff>&lt;/ORGANISATION_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;ORGANISATION_TYPE&gt;</FONT><FONT style='color:black;font-weight:bold;'>005</FONT><FONT color=#0000ff>&lt;/ORGANISATION_TYPE&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;EXTRACT_FROM_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>2009-06-01T00:00:00</FONT><FONT color=#0000ff>&lt;/EXTRACT_FROM_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;EXTRACTED_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>2009-06-30T15:40:15</FONT><FONT color=#0000ff>&lt;/EXTRACTED_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;TEAM_CODE&gt;</FONT><FONT style='color:black;font-weight:bold;'>5555</FONT><FONT color=#0000ff>&lt;/TEAM_CODE&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;EVENT_HCU_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>XXX1234</FONT><FONT color=#0000ff>&lt;/EVENT_HCU_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;SEX&gt;</FONT><FONT style='color:black;font-weight:bold;'>M</FONT><FONT color=#0000ff>&lt;/SEX&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;DATE_OF_BIRTH&gt;</FONT><FONT style='color:black;font-weight:bold;'>1900-05-05</FONT><FONT color=#0000ff>&lt;/DATE_OF_BIRTH&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;REFERRAL_FROM&gt;</FONT><FONT style='color:black;font-weight:bold;'>UN</FONT><FONT color=#0000ff>&lt;/REFERRAL_FROM&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;START_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>2008-12-24T00:00:00</FONT><FONT color=#0000ff>&lt;/START_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;/REFERRAL_DISCHARGE&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT></PRE>
 
<p>The parent is REFERRAL_DISCHARGE, I need the file name exactly how you have it plus the individual value from REFERRAL_ID to make it easy to link to the data included.</p>
<p>We are a not-for-profit organization and we have to report to the [New Zealand] Ministry of Health and our data is to be packed as individual xml files. We are not dealing with huge files (this one was only 316kb) and also they are relatively simple extracts, but I don’t know in the future... it may get more complicated.</p>
</div></div>

<p>For splitting an XML file less than 10MB into a lot of referral discharge files, this is the easiest way to do it:</p>

<pre>split()
{
  CMarkup xmlInput, xmlSubDoc;
  xmlInput.<a class="codelink" href="http://www.firstobject.com/dn_markLoad.htm">Load</a>( "input.xml" );
  <font color=blue>int</font> nFileCount = 0;
  <font color=blue>while</font> ( xmlInput.<a class="codelink" href="http://www.firstobject.com/dn_markFindElem.htm">FindElem</a>("//REFERRAL_DISCHARGE") )
  {
    ++nFileCount;
    xmlSubDoc.SetDoc( xmlInput.GetSubDoc() );
    str sID = xmlSubDoc.<a class="codelink" href="http://www.firstobject.com/dn_markFindGetData.htm">FindGetData</a>( "//REFERRAL_ID" );
    str sFilename = "test_" + sID + "_"+ nFileCount + ".xml";
    <a class="codelink" href="http://www.firstobject.com/dn_markWriteTextFile.htm">WriteTextFile</a>( sFilename, xmlSubDoc.GetDoc() );
  }
  <font color=blue>return</font> nFileCount;
}</pre>

<h4>Splitting a huge file</h4>

<p>For others who have really large files (especially over 100MB up to any number of gigabytes) use the XML reader mode which processes the source file on disk very efficiently. The only difference from the above script is opening the input file in read mode rather than loading it all into memory.</p>

<pre>split()
{
  CMarkup xmlInput, xmlSubDoc;
  xmlInput.<span style="background-color:#ffff7f"><a class="codelink" href="http://www.firstobject.com/dn_markOpen.htm">Open</a>( "input.xml", MDF_READFILE );</span>
  <font color=blue>int</font> nFileCount = 0;
  <font color=blue>while</font> ( xmlInput.FindElem("//REFERRAL_DISCHARGE") )
  {
    ++nFileCount;
    xmlSubDoc.SetDoc( xmlInput.GetSubDoc() );
    str sID = xmlSubDoc.FindGetData( "//REFERRAL_ID" );
    str sFilename = "test_" + sID + "_"+ nFileCount + ".xml";
    WriteTextFile( sFilename, xmlSubDoc.GetDoc() );
  }
  <span style="background-color:#ffff7f">xmlInput.Close();</span>
  <font color=blue>return</font> nFileCount;
}</pre>

<p>A note about usage of the <a href="http://www.firstobject.com/dn_markpath.htm">anywhere path</a>. If you want to grab multiple pieces of data like <code>xmlSubDoc.FindGetData("//REFERRAL_ID")</code> remember that the <code>//</code> anywhere path starts from the current position. So if you're not sure about the order of the data you are grabbing, call <code>xmlSubDoc.<a class="codelink" href="http://www.firstobject.com/dn_markResetPos.htm">ResetPos</a>()</code> in between calls to <code>FindGetData</code>.</p>

<span class=indexlist>
<p>See also:</p>
<p><a href="http://www.firstobject.com/split-xml.htm">Split XML with XML editor script</a></p>
<p><a href="http://www.firstobject.com/split-xml-file-into-smaller-pieces.htm">Split XML file into smaller pieces</a></p>
<p><a href="http://www.firstobject.com/xml-splitter-script-video.htm">Video of XML splitter script for splitting XML files</a></p>
<p><a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">C++ XML reader parses a very large XML file</a></p>
<p><a href="http://www.firstobject.com/dn_markOpen.htm">CMarkup Open Method - file read mode</a></p>
<p><a href="http://www.firstobject.com/parse-huge-xml-file-in-c++.htm">Parse huge XML file in C++</a></p>
</span>


]]></description>
</item>
<item>
<title>Export XML records with matching childset</title>
<link>http://www.firstobject.com/export-xml-records-with-matching-childset.htm</link>
<guid isPermaLink="false">export-xml-records-with-matching-childset.htm</guid>
<pubDate>Sat, 20 Nov 2010 23:09:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>Conventional wisdom has you importing and exporting XML to and from a database in order to run queries and utilize data that is in XML. But with firstobject's <a href="http://www.firstobject.com/dn_editor.htm">free XML editor</a> you can perform all sorts of operations rapidly and efficiently directly on the XML document. This example shows how to export subsets of records, query, tally and modify XML records in a real estate database XML file.</p>

<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> export records with matching childset</p></div>
<div class=commentposted><p>Eddie Wrenn 25-Jan-2010</p></div>
<div class=commentcontent>
<p>What I have is a list of properties for sale nationwide, contained in a 1.5gb XML file (your program is the only one which seems to handle this with ease!) I'm looking for a way to make the editor export all the records which have a matching childset, in this case 'locality' (in this example, London). There's 100,000 listings so not a manual job!</p>
<p>I've been successful splitting the file into 100,000 seperate files, named by the locality (using your tutorials). But patching them all together takes a long time, even if I automate it. A sample record below:</p>

<PRE lang=xml><FONT color=#0000ff>&lt;listing</FONT><FONT color=#be3232> key</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>1234567</FONT><FONT color=#0000ff>"</FONT><FONT color=#be3232> status</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>active</FONT><FONT color=#0000ff>"</FONT><FONT color=#be3232> updated</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>20090101T010101</FONT><FONT color=#0000ff>"</FONT><FONT color=#be3232> type</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>residence</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;title&gt;&lt;![CDATA[</FONT><FONT color=#804000>Xyz Street, London</FONT><FONT color=#0000ff>]]&gt;&lt;/title&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;supplementary-url&gt;&lt;![CDATA[</FONT><FONT color=#804000>1234567.htm</FONT><FONT color=#0000ff>]]&gt;&lt;/supplementary-url&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;description&gt;&lt;![CDATA[</FONT><FONT color=#804000>AVAILABLE 01/01/2010. This
beautifully decorated place is situated on a quiet back
street of Xyz Garden in the heart of Xyz London.
The owners have refurbished to a particluarly high
standard paying exceptional attention to detail to the
overall finish and decoration. As the apartment is
situated on the Nth floor there are great views of
London giving the apartment excellent natural light.
Features available. We highly recommend a viewing.</FONT><FONT color=#0000ff>]]&gt;&lt;/description&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;residence</FONT><FONT color=#be3232> type</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>flat</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;bedrooms&gt;&lt;![CDATA[</FONT><FONT color=#804000>1</FONT><FONT color=#0000ff>]]&gt;&lt;/bedrooms&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;bathrooms&gt;&lt;![CDATA[</FONT><FONT color=#804000>1</FONT><FONT color=#0000ff>]]&gt;&lt;/bathrooms&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;reception&gt;&lt;![CDATA[</FONT><FONT color=#804000>yes</FONT><FONT color=#0000ff>]]&gt;&lt;/reception&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;/residence&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;authority&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;lease</FONT><FONT color=#be3232> currency</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>GBP</FONT><FONT color=#0000ff>"</FONT><FONT color=#be3232> term</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>private</FONT><FONT color=#0000ff>"</FONT><FONT color=#be3232> visible</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>yes</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
      </FONT><FONT color=#0000ff>&lt;price</FONT><FONT color=#be3232> term</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>weekly</FONT><FONT color=#0000ff>"&gt;&lt;![CDATA[</FONT><FONT color=#804000>450</FONT><FONT color=#0000ff>]]&gt;&lt;/price&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;/lease&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;/authority&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;address</FONT><FONT color=#be3232> visible</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>yes</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;country&gt;&lt;![CDATA[</FONT><FONT color=#804000>GB</FONT><FONT color=#0000ff>]]&gt;&lt;/country&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;subdivision&gt;&lt;![CDATA[</FONT><FONT color=#804000>London</FONT><FONT color=#0000ff>]]&gt;&lt;/subdivision&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;locality&gt;&lt;![CDATA[</FONT><FONT color=#804000>London</FONT><FONT color=#0000ff>]]&gt;&lt;/locality&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;postcode&gt;&lt;![CDATA[</FONT><FONT color=#804000>AA1A 1AA</FONT><FONT color=#0000ff>]]&gt;&lt;/postcode&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;road&gt;&lt;![CDATA[</FONT><FONT color=#804000>Xyz Street</FONT><FONT color=#0000ff>]]&gt;&lt;/road&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;/address&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;attachments&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;photo</FONT><FONT color=#be3232> title</FONT><FONT color=#0000ff>=""</FONT><FONT color=#be3232> updated</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>20090101T010101</FONT><FONT color=#0000ff>"</FONT><FONT color=#be3232> type</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>image/jpeg</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
      </FONT><FONT color=#0000ff>&lt;uri&gt;&lt;![CDATA[</FONT><FONT color=#804000>1234567_354_255.jpg</FONT><FONT color=#0000ff>]]&gt;&lt;/uri&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;/photo&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;photo</FONT><FONT color=#be3232> title</FONT><FONT color=#0000ff>=""</FONT><FONT color=#be3232> updated</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>20091118T201049</FONT><FONT color=#0000ff>"</FONT><FONT color=#be3232> type</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>image/jpeg</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
      </FONT><FONT color=#0000ff>&lt;uri&gt;&lt;![CDATA[</FONT><FONT color=#804000>23456789_354_255.jpg</FONT><FONT color=#0000ff>]]&gt;&lt;/uri&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;/photo&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;photo</FONT><FONT color=#be3232> title</FONT><FONT color=#0000ff>=""</FONT><FONT color=#be3232> updated</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>20091118T201049</FONT><FONT color=#0000ff>"</FONT><FONT color=#be3232> type</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>image/jpeg</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
      </FONT><FONT color=#0000ff>&lt;uri&gt;&lt;![CDATA[</FONT><FONT color=#804000>34567890_354_255.jpg</FONT><FONT color=#0000ff>]]&gt;&lt;/uri&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;/photo&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;/attachments&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;vendor&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;name&gt;&lt;![CDATA[</FONT><FONT color=#804000>Xyz Property Services</FONT><FONT color=#0000ff>]]&gt;&lt;/name&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;phone&gt;&lt;![CDATA[</FONT><FONT color=#804000>020 1234 5678</FONT><FONT color=#0000ff>]]&gt;&lt;/phone&gt;</FONT><FONT style='color:black;font-weight:bold;'>
    </FONT><FONT color=#0000ff>&lt;email&gt;&lt;![CDATA[</FONT><FONT color=#804000>enquiries@xyz.example</FONT><FONT color=#0000ff>]]&gt;&lt;/email&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;/vendor&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;/listing&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT></PRE>
</div></div>

<p>To find all the matching records on a huge file you do something like this: from the File menu select New Program, paste in the following script, and modify the input file pathname (note that for C++ syntax, use a double backslash for backslashes in the pathname).</p>

<pre>pull_by_locality()
{
  <font color=blue>str</font> strSearch = "London";
  CMarkup xmlInput, xmlListing, xmlOutput;
  xmlInput.Open( "C:\\huge.xml", MDF_READFILE );
  <font color=blue>while</font> ( xmlInput.FindElem("//listing") )
  {
    xmlListing.SetDoc( xmlInput.GetSubDoc() );
    <font color=blue>if</font> ( xmlListing.FindGetData("//locality") == strSearch )
      xmlOutput.AddSubDoc( xmlListing.GetDoc() );
  }
  <font color=blue>return</font> xmlOutput.GetDoc();
}</pre>

<p>To export the result document as London.xml:</p>

<pre>xmlOutput.Save( strSearch + ".xml" );</pre>

<p>To delete (or actually skip) records which are no longer required e.g. we want them if status is "active" but not if it is "inactive" or "sold":</p>

<pre>xmlListing.ResetPos();
<font color=blue>if</font> ( xmlListing.FindGetData("//status") != "active" )
  ...</pre>

<p>To change an element tag name from title to topicname in the output, first add the new element with the same content, then remove the old one (this is the easiest way to make sure the new element goes into the same position as the removed one).</p>

<pre>xmlListing.ResetPos();
<font color=blue>if</font> (xmlListing.FindElem("//title"))
{
  xmlListing.AddElem("topicname", xmlListing.GetData());
  xmlListing.FindPrevElem(); <font color=green>// title</font>
  xmlListing.RemoveElem();
}</pre>

<p>As far as inputing the search string, FOAL scripts don't support dialogs yet. However, you can automate the process if you can put the search string in a file such as search.txt which could be retrieved in the FOAL script with:</p>

<pre><font color=blue>str</font> s;
if ( ReadTextFile("C:\\search.txt", s) && StrLength(s) > 2 )
  s = StrMid( s, 0, StrLength(s)-2 ); <font color=green>// remove CRLF</font>
<font color=blue>str</font> strSearch = s;</pre>

<p>In DOS, if you had a script named search.foal, then you could create a search.bat file as follows to let you type <code>search London</code> on the command line.</p>

<pre>echo %1 > C:\search.txt
"C:\Program Files\firstobject\foxe.exe" -run C:\search.foal</pre>

<p>Here's an interesting diagnostic to count instances of each locality:</p>

<pre>locality_tally()
{
  CMarkup xmlLocalities, xmlInput;
  xmlInput.Open( "huge.xml", MDF_READFILE );
  <font color=blue>while</font> ( xmlInput.FindElem("//locality") )
  {
    <font color=blue>str</font> sLoc = xmlInput.GetData();
    <font color=blue>int</font> n = 1;
    <font color=blue>if</font> ( xmlLocalities.RestorePos(sLoc) )
      n = StrToInt(xmlLocalities.GetAttrib("n")) + 1;
    <font color=blue>else</font>
    {
      xmlLocalities.ResetPos();
      xmlLocalities.AddElem("locality",sLoc);
      xmlLocalities.SavePos(sLoc);
    }
    xmlLocalities.SetAttrib("n", n);
  }
  <font color=blue>return</font> xmlLocalities;
}</pre>

<p>Would yield a result like this:</p>

<PRE lang=xml><FONT color=#0000ff>&lt;locality</FONT><FONT color=#be3232> n</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>890</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>London</FONT><FONT color=#0000ff>&lt;/locality&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;locality</FONT><FONT color=#be3232> n</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>431</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>Yorkshire</FONT><FONT color=#0000ff>&lt;/locality&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT></PRE>


<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> how to clear the XML result</p></div>
<div class=commentposted><p>Eddie Wrenn 27-Jan-2010</p></div>
<div class=commentcontent>
<p>Now I'm piggybacking "searches" on top of each other, so it will search for London, output them into a London file, then search for Yorkshire, and output that into a Yorkshire file. My problem is that the editor [script] will retain the results for London, and add them to the top of my Yorkshire file - is there a little code that will clear the internal memory before starting the next process?</p>
</div></div>

<pre>xmlOutput.<a class="codelink" href="http://www.firstobject.com/dn_markSetDoc.htm">SetDoc</a>("");</pre>


<span class=indexlist>
<p>See also:</p>
<p><a href="http://www.firstobject.com/xml-editor-command-line.htm">Using the firstobject XML editor from the command line</a></p>
<p><a href="http://www.firstobject.com/counting-xml-tag-names-and-values.htm">Counting XML tag names and values with foal</a></p>
<p><a href="http://www.firstobject.com/dn_foal.htm">firstobject Access Language</a></p>
<p><a href="http://www.firstobject.com/format-xml-indent-align-beautify-xml.htm">Format XML, indent align beautify clean up XML</a></p>
<p><a href="http://www.firstobject.com/simple-xml-editor-memory-stick.htm">Simple XML editor meets memory stick</a></p>
<p><a href="http://www.firstobject.com/split-xml.htm">Split XML with XML editor script</a></p>
<p><a href="http://www.firstobject.com/tree-customization-in-xml-editor.htm">Tree customization in the firstobject XML editor</a></p>
<p><a href="http://www.firstobject.com/video-demo-editing-rss-xml-in-tree-of-xml-editor.htm">Video demo of editing RSS XML in the tree view of the free firstobject XML editor</a></p>
<p><a href="http://www.firstobject.com/xml-editor-format-xml-customize-treeview-program.htm">Video of XML Editor format XML, customize treeview, and program</a></p>
<p><a href="http://www.firstobject.com/split-xml-file-into-smaller-pieces.htm">Split XML file into smaller pieces</a></p>
<p><a href="http://www.firstobject.com/xml-splitter-script-video.htm">Video of XML splitter script for splitting XML files</a></p>
<p><a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">C++ XML reader parses a very large XML file</a></p>
<p><a href="http://www.firstobject.com/parse-huge-xml-file-in-c++.htm">Parse huge XML file in C++</a></p>
</span>


]]></description>
</item>
<item>
<title>CMarkup GetNthAttrib Method</title>
<link>http://www.firstobject.com/dn_markGetNthAttrib.htm</link>
<guid isPermaLink="false">dn_markGetNthAttrib.htm</guid>
<pubDate>Sat, 20 Nov 2010 23:07:00 GMT</pubDate>
<category>CMarkup Method</category>
<description><![CDATA[

<pre class="declarationsyntax"><font color=blue>bool</font> <a href="http://www.firstobject.com/dn_markupmethods.htm">CMarkup</a>::GetNthAttrib( <font color=blue>int</font> n, <a href="http://www.firstobject.com/dn_markmcdstr.htm">MCD_STR</a>& strName, <a href="http://www.firstobject.com/dn_markmcdstr.htm">MCD_STR</a>& strValue ) <font color=blue>const</font>;</pre>

<p>Call <code>GetNthAttrib</code> to get the string name and value of the Nth attribute in the main position element. The first attribute is 0, the second is 1, etc. If there is no current position or the current position node does not have the specified attribute, it returns <code><font color=blue>false</font></code>.</p>

<p>Similar to <a class="codelink" href="http://www.firstobject.com/dn_markGetAttribName.htm">GetAttribName</a>, this method lets you iterate through the attributes of an element or processing instruction. However, this is usually better because it provides the attribute value without an additional call to <a class="codelink" href="http://www.firstobject.com/dn_markGetAttrib.htm">GetAttrib</a>, and it returns a <code><font color=blue>bool</font></code> which is convenient for looping. For example:</p>

<pre>MCD_STR strName, strAttrib;
<font color=blue>int</font> n = 0;
<font color=blue>while</font> ( xml.GetNthAttrib(n++, strName, strAttrib) )
{
  <font color=green>// do something with strName, strAttrib</font>
}</pre>

<p><code>GetNthAttrib</code> also works when the main position is a processing instruction node with attributes. See <a href="http://www.firstobject.com/dn_marknodes.htm">Node Methods in CMarkup</a>.</p>


]]></description>
</item>
<item>
<title>CMarkup XML Parser Performance</title>
<link>http://www.firstobject.com/cmarkup-xml-parser-performance.htm</link>
<guid isPermaLink="false">cmarkup-xml-parser-performance.htm</guid>
<pubDate>Sat, 20 Nov 2010 23:04:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p><a href="http://www.firstobject.com/cmarkup-11.3-release-notes.htm">Release 11.3</a> has made a leap in performance (e.g. from 39mb/s to 53mb/s* excluding file I/O), so its a good time to post some data on the speed of CMarkup, and to discuss XML parser performance issues. Here is a comparison of 11.3 with the previous release 11.2; raw parsing goes from 40000 to 54000 bytes per millisecond and attribute parsing (the basis for attribute methods) goes from 5000 to 9000 b/ms (see also <a href="http://www.firstobject.com/attribute-method-performance.htm">Attribute Method Performance</a>).</p>

<p><table class="shadedtable">
<tr>
<th>Release</th>
<th>Chart</th>
<th colspan=2>parse doc/attrib</th>
<th colspan=2>create doc/attrib</th>
<th>Units</th>
</tr>
<tr>
<td>CMarkup 11.2</td>
<td style="background-color:white">
<img src="http://www.firstobject.com/chartperf112.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x32&cht=bhg:na&chco=FF3700,FF9900,0000FF,76A4FB&chd=t:40002|5175|12331|4754&chds=0,60000" width="240" height="32" alt="" />
</td>
<td>40002</td>
<td>5175</td>
<td>12331</td>
<td>4754</td>
<td>b/ms</td>
</tr>
<tr>
<td>CMarkup 11.3</td>
<td style="background-color:white">
<img src="http://www.firstobject.com/chartperf113.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x32&cht=bhg&chco=FF3700,FF9900,0000FF,76A4FB&chd=t:54042|9195|14394|6820&chds=0,60000" width="240" height="32" alt="" />
</td>
<td>54042</td>
<td>9195</td>
<td>14394</td>
<td>6820</td>
<td>b/ms</td>
</tr>
</table></p>

<p>Since these measurements do not involve disk I/O, the speeds are measured in character units per millisecond where the character unit is b for byte, w for word (2 bytes), and dw for double word (4 bytes), depending on the build and platform. In the first chart I include 2 parse tests and then 2 corresponding create tests.</p>

<p><table class="shadedtable">
<tr>
<td style="background-color:white;padding-top:8px"><img src="http://www.firstobject.com/chartperfpd.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=40x8&cht=bhg&chco=FF3700&chd=t:10&chds=0,10" width="40" height="8" alt="" /></td>
<td>parse document</td>
<td>this is the core indicator of parsing speed; the document string is passed to <a class="codelink" href="http://www.firstobject.com/dn_markSetDoc.htm">SetDoc</a> in memory and parsed, it is not loading the document from disk</td>
</tr>
<tr>
<td style="background-color:white;padding-top:8px"><img src="http://www.firstobject.com/chartperfpa.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=40x8&cht=bhg&chco=FF9900&chd=t:10&chds=0,10" width="40" height="8" alt="" /></td>
<td>parse attributes</td>
<td>loops through the document reading all attributes with <a class="codelink" href="http://www.firstobject.com/dn_markGetAttribName.htm">GetAttribName</a> and <a class="codelink" href="http://www.firstobject.com/dn_markGetAttrib.htm">GetAttrib</a> (the new <a class="codelink" href="http://www.firstobject.com/dn_markGetNthAttrib.htm">GetNthAttrib</a> method is more efficient way to do this)</td>
</tr>
<tr>
<td style="background-color:white;padding-top:8px"><img src="http://www.firstobject.com/chartperfcd.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=40x8&cht=bhg&chco=0000FF&chd=t:10&chds=0,10" width="40" height="8" alt="" /></td>
<td>create document</td>
<td>builds a document using an <a class="codelink" href="http://www.firstobject.com/dn_markAddElem.htm">AddElem</a> and <a class="codelink" href="http://www.firstobject.com/dn_markSetAttrib.htm">SetAttrib</a> for each element, the document is <b>not</b> saved to disk, there is no disk I/O in this measurement</td>
</tr>
<tr>
<td style="background-color:white;padding-top:8px"><img src="http://www.firstobject.com/chartperfca.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=40x8&cht=bhg&chco=76A4FB&chd=t:10&chds=0,10" width="40" height="8" alt="" /></td>
<td>create attributes</td>
<td>creates a document with up to 4 randomly selected attributes and values per element, the <a class="codelink" href="http://www.firstobject.com/dn_markSetAttrib.htm">SetAttrib</a> call occassionally overwrites an attribute</td>
</tr>
</table></p>

<h4>The reason for release 11.3 performance improvement</h4>

<p>One of the most intensively used operations in the parser is determining whether a character is one of a set of characters. In 11.3 I replaced <code>MCD_PSZCHR</code> (<code>strchr</code>) with a lookup define which is an order of magnitude faster and yields a roughly 30% speed improvement in overall raw parser speed. The new lookup define only checks the bounds and then returns the offset in the array, where <code>c</code> is the character, <code>f</code> and <code>l</code> are the bounds (first and last) and <code>s</code> is the lookup array (a string):</p>

<pre><font color=blue>#define</font> x_ISONEOF(c,f,l,s) ((c&gt;=f&&c&lt;=l)?(int)(s[c-f]):0)</pre>

<p>So, for example, a whitespace check uses <code>x_ISONEOF</code> and passes the bounds 9 and 32, and a lookup string array for the range between those bounds:</p>

<pre><font color=green>// classic whitespace " \t\n\r"</font>
<font color=blue>#define</font> x_ISWHITESPACE(c) x_ISONEOF(c,9,32,
  "\2\3\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1")</pre>

<p>Another roughly 5% overall improvement was gained by replacing <code>MCD_PSZNCMP</code> (<code>strncmp</code>) with a simple speedy implementation of string compare.</p>

<h4>Comparing different builds of release 11.3</h4>

<p>Build configuration makes a big difference in performance. See <a href="http://www.firstobject.com/dn_markansifile.htm">ANSI and Unicode files and C++ strings</a> and <a href="http://www.firstobject.com/non-unicode-text-handling.htm">non-Unicode text handling in CMarkup</a> for discussions of string character set options.</p>

<p><table class="shadedtable">
<tr>
<th>Build</th>
<th>Chart</th>
<th colspan=2>parse doc/attrib</th>
<th colspan=2>create doc/attrib</th>
<th>Units</th>
</tr>
<tr>
<td>MFC (UTF-8)</td>
<td style="background-color:white">
<img src="http://www.firstobject.com/chartperfmfc.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x32&cht=bhg&chco=FF3700,FF9900,0000FF,76A4FB&chd=t:54042|9195|14394|6820&chds=0,60000" width="240" height="32" alt="" />
</td>
<td>54042</td>
<td>9195</td>
<td>14394</td>
<td>6820</td>
<td>b/ms</td>
</tr>
<tr>
<td>STL (UTF-8)</td>
<td style="background-color:white">
<img src="http://www.firstobject.com/chartperfstl.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x32&cht=bhg&chco=FF3700,FF9900,0000FF,76A4FB&chd=t:55923|9193|11583|6061&chds=0,60000" width="240" height="32" alt="" />
</td>
<td>55923</td>
<td>9193</td>
<td>11583</td>
<td>6061</td>
<td>b/ms</td>
</tr>
<tr>
<td>MFC MBCS</td>
<td style="background-color:white">
<img src="http://www.firstobject.com/chartperfmfcmbcs.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x32&cht=bhg&chco=FF3700,FF9900,0000FF,76A4FB&chd=t:14424|3269|11084|3492&chds=0,60000" width="240" height="32" alt="" />
</td>
<td>14424</td>
<td>3269</td>
<td>11084</td>
<td>3492</td>
<td>b/ms</td>
</tr>
<tr>
<td>STL MBCS</td>
<td style="background-color:white">
<img src="http://www.firstobject.com/chartperfstlmbcs.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x32&cht=bhg&chco=FF3700,FF9900,0000FF,76A4FB&chd=t:14783|3137|8636|3223&chds=0,60000" width="240" height="32" alt="" />
</td>
<td>14783</td>
<td>3137</td>
<td>8636</td>
<td>3223</td>
<td>b/ms</td>
</tr>
<tr>
<td>MSXML6 MFC MBCS</td>
<td style="background-color:white">
<img src="http://www.firstobject.com/chartperfmsxml6.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x32&cht=bhg&chco=FF3700,FF9900,0000FF,76A4FB&chd=t:3832|1762|1849|1347&chds=0,60000" width="240" height="32" alt="" />
</td>
<td>3832</td>
<td>1762</td>
<td>1849</td>
<td>1347</td>
<td>b/ms</td>
</tr>
<tr>
<td>MFC WCHAR</td>
<td style="background-color:white">
<img src="http://www.firstobject.com/chartperfmfcwide.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x32&cht=bhg&chco=FF3700,FF9900,0000FF,76A4FB&chd=t:57405|8607|14530|6594&chds=0,60000" width="240" height="32" alt="" />
</td>
<td>57405</td>
<td>8607</td>
<td>14530</td>
<td>6594</td>
<td>w/ms</td>
</tr>
<tr>
<td>STL WCHAR</td>
<td style="background-color:white">
<img src="http://www.firstobject.com/chartperfstlwide.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x32&cht=bhg&chco=FF3700,FF9900,0000FF,76A4FB&chd=t:57780|8607|10744|5639&chds=0,60000" width="240" height="32" alt="" />
</td>
<td>57780</td>
<td>8607</td>
<td>10744</td>
<td>5639</td>
<td>w/ms</td>
</tr>
<tr>
<td>MSXML6 MFC WCHAR</td>
<td style="background-color:white">
<img src="http://www.firstobject.com/chartperfmsxml6wide.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x32&cht=bhg&chco=FF3700,FF9900,0000FF,76A4FB&chd=t:3950|1939|1963|1428&chds=0,60000" width="240" height="32" alt="" />
</td>
<td>3950</td>
<td>1939</td>
<td>1963</td>
<td>1428</td>
<td>w/ms</td>
</tr>
</table></p>


<p>Using Unicode (either UTF-8 or WCHAR) strings in memory is much more efficient than MBCS which utilizes Windows APIs to determine character boundaries according to the locale character set. MSXML is very slow due to the overhead of COM and is slightly faster in a WCHAR build which avoids conversion to and from COM's WCHAR-based strings.</p>

<h4>File mode performance</h4>

<p>Unlike the measurements above, the XML reader and XML writer measurements are all in bytes per millisecond regardless of build because they are based on the file I/O rather than the in-memory character unit size. The file is UTF-8, which means the MBCS and wide character builds have the extra penalty of character set conversion. The MBCS conversion can be done using the libc (stdlib.h) function <code>wctomb</code> (not using the Windows API).</p>

<p><table class="shadedtable">
<tr>
<th>Build</th>
<th>Chart</th>
<th>XML reader</th>
<th>XML writer</th>
<th>Units</th>
</tr>
<tr>
<td>MFC</td>
<td style="background-color:white;padding-top:8px">
<img src="http://www.firstobject.com/chartperffilemfc.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x16&cht=bhg&chco=FF3700,0000FF,0000FF,76A4FB&chd=t:15086|11528&chds=0,60000" width="240" height="16" alt="" />
</td>
<td>15086</td>
<td>11528</td>
<td>b/ms</td>
</tr>
<tr>
<td>STL</td>
<td style="background-color:white;padding-top:8px">
<img src="http://www.firstobject.com/chartperffilestl.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x16&cht=bhg&chco=FF3700,0000FF,0000FF,76A4FB&chd=t:13858|9540&chds=0,60000" width="240" height="16" alt="" />
</td>
<td>13858</td>
<td>9540</td>
<td>b/ms</td>
</tr>
<tr>
<td>MFC WCHAR</td>
<td style="background-color:white;padding-top:8px">
<img src="http://www.firstobject.com/chartperffilemfcwide.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x16&cht=bhg&chco=FF3700,0000FF,0000FF,76A4FB&chd=t:10854|8757&chds=0,60000" width="240" height="16" alt="" />
</td>
<td>10854</td>
<td>8757</td>
<td>b/ms</td>
</tr>
<tr>
<td>STL WCHAR</td>
<td style="background-color:white;padding-top:8px">
<img src="http://www.firstobject.com/chartperffilestlwide.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x16&cht=bhg&chco=FF3700,0000FF,0000FF,76A4FB&chd=t:10717|7509&chds=0,60000" width="240" height="16" alt="" />
</td>
<td>10717</td>
<td>7509</td>
<td>b/ms</td>
</tr>
<tr>
<td>MFC MBCS</td>
<td style="background-color:white;padding-top:8px">
<img src="http://www.firstobject.com/chartperffilemfcmbcs.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x16&cht=bhg&chco=FF3700,0000FF,0000FF,76A4FB&chd=t:11673|9846&chds=0,60000" width="240" height="16" alt="" />
</td>
<td>11673</td>
<td>9846</td>
<td>b/ms</td>
</tr>
<tr>
<td>STL MBCS</td>
<td style="background-color:white;padding-top:8px">
<img src="http://www.firstobject.com/chartperffilestlmbcs.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x16&cht=bhg&chco=FF3700,0000FF,0000FF,76A4FB&chd=t:10444|8137&chds=0,60000" width="240" height="16" alt="" />
</td>
<td>10444</td>
<td>8137</td>
<td>b/ms</td>
</tr>
<tr>
<td>MFC MBCS libc</td>
<td style="background-color:white;padding-top:8px">
<img src="http://www.firstobject.com/chartperffilemfcmbcslibc.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x16&cht=bhg&chco=FF3700,0000FF,0000FF,76A4FB&chd=t:2231|2844&chds=0,60000" width="240" height="16" alt="" />
</td>
<td>2231</td>
<td>2844</td>
<td>b/ms</td>
</tr>
<tr>
<td>STL MBCS libc</td>
<td style="background-color:white;padding-top:8px">
<img src="http://www.firstobject.com/chartperffilestlmbcslibc.png" _dynsrc="http://chart.apis.google.com/chart?chxs=0,676767,0,0,l,676767|1,676767,0,0,_,676767|2,676767,0,0,_,676767|3,676767,0,0,_,676767&chxt=x,y,r,t&chbh=8,0,0&chs=240x16&cht=bhg&chco=FF3700,0000FF,0000FF,76A4FB&chd=t:2155|2677&chds=0,60000" width="240" height="16" alt="" />
</td>
<td>2155</td>
<td>2677</td>
<td>b/ms</td>
</tr>
</table></p>

<p>See also:<br>
<a href="http://www.firstobject.com/dn_markperf.htm">Archived CMarkup Performance Tests</a><br>
<a href="http://www.firstobject.com/attribute-method-performance.htm">Attribute Method Performance</a>
</p>

<p><i>* Measurements here are representative of the speed with my own sample data on a 1.7GHz 1GB Vista netbook. Running these tests twice in a row often gets slighly different results because they are affected by variations in CPU.</i></p>


]]></description>
</item>
<item>
<title>Attribute Method Performance</title>
<link>http://www.firstobject.com/attribute-method-performance.htm</link>
<guid isPermaLink="false">attribute-method-performance.htm</guid>
<pubDate>Sat, 20 Nov 2010 23:01:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Attribute parsing performance came up several times this year, and some significant improvements were made in CMarkup <a href="http://www.firstobject.com/cmarkup-11.3-release-notes.htm">release 11.3</a>.</p>

<p>In its attribute methods, in every call CMarkup reparses attributes up to the one that is accessed. This can lead to poorer than expected performance when you have attribute intensive code, i.e. code that repeatedly accesses or checks for many attributes. This is due to an original design trade off: CMarkup does not store attribute indexes.</p>

<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> CMarkup - Attribute Query Speed</p></div>
<div class=commentposted><p>Cameron Dunn 23-Jun-2010</p></div>
<div class=commentcontent>
<p>I've been very impressed with the speed of loading and parsing. However, I've hit one area which is surprisingly slow which I wanted to ask you about - XML attributes.</p>
<p>I'm loading about 3000 XML files, for a total of 99468934 bytes. I'm loading in the files myself and then passing the string to CMarkup. If I do that, and then loop down into every element in every file, it takes about 2 seconds (specifically, 1985ms), which I thought was pretty impressive.</p>
<p>However, if I do the same thing but also loop over every attribute on every element, it takes 8 seconds. I found this a bit surprising - obviously there's no additional file IO time or anything like that, it's all in string processing. The interface which CMarkup provides to access attributes is very string heavy - you  need to get the attribute by name and then query the value using this name.</p>
<p>Is there a quicker way to loop over the XML attributes? I need the name and value for each attribute, but they can be in the order in which they occur in the file.</p>
<p><br>...I iterate the attributes for a single element with <code>GetAttribName()</code> and then call <code>GetAttrib()</code> to get their values.</p>
</div></div>

<p>CMarkup <a href="http://www.firstobject.com/cmarkup-11.3-release-notes.htm">release 11.3</a> introduces a new method <a class="codelink" href="http://www.firstobject.com/dn_markGetNthAttrib.htm">GetNthAttrib</a> which is twice as efficient as <a class="codelink" href="http://www.firstobject.com/dn_markGetAttribName.htm">GetAttribName</a> combined with <a class="codelink" href="http://www.firstobject.com/dn_markGetAttrib.htm">GetAttrib</a>, and in addition attribute parsing is about twice as fast (see <a href="http://www.firstobject.com/cmarkup-xml-parser-performance.htm">CMarkup XML Parser Performance</a>). So, iterating the attributes in your case might be reduced from 6 seconds to 1.5 seconds.</p>

<p>I did design a solution to manage and reuse attribute indexes for the current element, but it was actually slower for a single attribute access and wasn't really fast enough to justify the added complexity. Another option would be to include attributes much like elements in CMarkup indexing, but I think that's too fundamental at this point. So I've chosen to remain with the original reparse design for the time being, and hopefully the <a href="http://www.firstobject.com/cmarkup-11.3-release-notes.htm">11.3 performance boost</a> and new method will help out enough.</p>

<p>If you have intensive use of attributes, in some cases you might want to extract them with <a class="codelink" href="http://www.firstobject.com/dn_markGetNthAttrib.htm">GetNthAttrib</a> to an external map as a more efficient machanism to access them repeatedly. You can even map them in a separate CMarkup object as elements using <a class="codelink" href="http://www.firstobject.com/dn_markSavePos.htm">SavePos</a> and then <a class="codelink" href="http://www.firstobject.com/dn_markRestorePos.htm">RestorePos</a> to do the lookup.</p>


]]></description>
</item>
<item>
<title>Video demo of editing RSS XML in the tree view of the free firstobject XML editor</title>
<link>http://www.firstobject.com/video-demo-editing-rss-xml-in-tree-of-xml-editor.htm</link>
<guid isPermaLink="false">video-demo-editing-rss-xml-in-tree-of-xml-editor.htm</guid>
<pubDate>Sat, 12 Jun 2010 22:02:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>See how to use the tree view to edit RSS (and any XML or HTML) in this screencast video demonstrating this new feature in the <a href="http://www.firstobject.com/dn_editor.htm">free firstobject XML editor</a> <a href="http://www.firstobject.com/dn_editcomments.htm#20100612220000">release 2.4.1</a>.</p>

<p><object width="480" height="295"><param name="movie" value="http://www.youtube.com/v/qvaDItMoAas&hl=en_US&fs=1&"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/qvaDItMoAas&hl=en_US&fs=1&" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object></p>

<p>See also:</p>

<p><a href="http://www.firstobject.com/xml-splitter-script-video.htm"><img border=0 src="http://www.firstobject.com/play.gif"></a> <a href="http://www.firstobject.com/xml-splitter-script-video.htm">video of XML splitter script</a><br>
<a href="http://www.firstobject.com/format-xml-indent-align-beautify-xml.htm">Format XML, indent align beautify clean up XML</a><br>
<a href="http://www.firstobject.com/tree-customization-in-xml-editor.htm">Tree customization in the firstobject XML editor</a></p> 


]]></description>
</item>
<item>
<title>XML Editor format XML, customize treeview, and program</title>
<link>http://www.firstobject.com/xml-editor-format-xml-customize-treeview-program.htm</link>
<guid isPermaLink="false">xml-editor-format-xml-customize-treeview-program.htm</guid>
<pubDate>Sun, 11 Oct 2009 03:00:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>This screencast video demonstrates the <a href="http://www.firstobject.com/dn_editor.htm">free firstobject XML editor</a>, and how to format XML, customize the treeview, generate and step through a C++ style program.</p>

<p><object width="480" height="295"><param name="movie" value="http://www.youtube.com/v/fGrqQn2qyfw&hl=en&fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/fGrqQn2qyfw&hl=en&fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object></p>

<p>See also:</p>

<p><a href="http://www.firstobject.com/xml-splitter-script-video.htm"><img border=0 src="http://www.firstobject.com/play.gif"></a> <a href="http://www.firstobject.com/xml-splitter-script-video.htm">video of XML splitter script</a><br>
<a href="http://www.firstobject.com/format-xml-indent-align-beautify-xml.htm">Format XML, indent align beautify clean up XML</a><br>
<a href="http://www.firstobject.com/tree-customization-in-xml-editor.htm">Tree customization in the firstobject XML editor</a><br>
<a href="http://www.firstobject.com/dn_foal.htm">firstobject Access Language</a><br>
<a href="http://www.firstobject.com/counting-xml-tag-names-and-values.htm">Counting XML tag names and values with foal</a><br>
<a href="http://www.firstobject.com/convert-ansi-file-to-unicode.htm">Convert ANSI file to Unicode</a></p> 


]]></description>
</item>
<item>
<title>XSLT in the firstobject XML editor</title>
<link>http://www.firstobject.com/xslt-in-xml-editor.htm</link>
<guid isPermaLink="false">xslt-in-xml-editor.htm</guid>
<pubDate>Thu, 10 Sep 2009 11:05:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>Invoke <b>Transform F9</b> in an XML or XSL document to perform the transformation (this uses MSXML XSLT). If the stylesheet is specified in the XML file you can just press F9 and it will begin immediately. Specify the XSL file as follows:</p>

<PRE lang=xml><FONT color=#0000ff>&lt;?</FONT><FONT color=#004080>xml-stylesheet href</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>file.xsl</FONT><FONT color=#0000ff>"</FONT><FONT color=#004080> type</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>text/xsl</FONT><FONT color=#0000ff>"?&gt;</FONT></PRE>

<p>If the XSL file is not specified, you must have both the XML and XSL documents open in the editor. When you invoke the Transform, you will be prompted with a list of open documents to select the other document. So if you press F9 from the XML document it will prompt you for the stylesheet, and if you press F9 from the stylesheet, it will prompt you for the XML to transform.</p>

<p>Like MSXML Validate Alt+F7, the Transform function uses the Windows MSXML component on your machine.</p>


]]></description>
</item>
<item>
<title>firstobject XML Editor 2.4 Release Notes</title>
<link>http://www.firstobject.com/xml-editor-2.4-release-notes.htm</link>
<guid isPermaLink="false">xml-editor-2.4-release-notes.htm</guid>
<pubDate>Thu, 10 Sep 2009 10:11:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>Release 2.4 of foxe (<a href="http://www.firstobject.com/dn_editor.htm">free XML editor download</a>) adds CMarkup 11.2, MSXML-based XSLT support, and adds several fixes to the intermediate release 2.3.5.</p>

<p>
<li>Transform F9 (uses MSXML XSLT) (see <a href="http://www.firstobject.com/xslt-in-xml-editor.htm">XSLT in the firstobject XML editor</a>)</li>
<li>Right-click FOAL script debug option for Continue/Run F10</li>
<li>CMarkup <a href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">release 11.2</a> fixes, especially for file mode</li>
<li>Added <a class="codelink" href="http://www.firstobject.com/dn_markGetDocElemCount.htm">GetDocElemCount</a> to FOAL scripting</li>
<li>fix: mouse horizontal scroll bug</li>
<li>fix: 2.3.2 File Open encoding drop-down was being ignored</li>
<li>fix: 2.3.5 Undo Redo bug (select word and type new word)</li>
</p>

<p>Those are just what has changed since intermediate release 2.3.5. A lot has happened since the last full release 2.3. Here are the interim releases:</p>

<p>
<li><a href="http://www.firstobject.com/dn_editcomments.htm#20090617213000">editor 2.3.5 help search, file mode subdocs</a></li>
<li><a href="http://www.firstobject.com/dn_editcomments.htm#20090511104500">editor 2.3.4 better far east char performance</a></li>
<li><a href="http://www.firstobject.com/dn_editcomments.htm#20090417070000">editor 2.3.3 tree hot keys</a></li>
<li><a href="http://www.firstobject.com/dn_editcomments.htm#20081229223000">editor 2.3.2 encoding, msxml 6.0, tree</a></li>
<li><a href="http://www.firstobject.com/dn_editcomments.htm#20081021131100">editor Beta 2.3.1 removed excess registry setting</a></li>
</p>

<h4>MFC component improvements</h4>

<table width=100% cellspacing=0 cellpadding=5><tr><td valign=top bgcolor=f3fce2 width=30>
<p><a href="http://www.firstobject.com/dn_markadvanced.htm"><img border=0 src="http://www.firstobject.com/cmarkupdevadv.gif" alt="Advanced CMarkup Developer License"></a></p></td><td bgcolor=f3fce2>
<p>The complete MFC source code for the firstobject XML editor comes with <a href="http://www.firstobject.com/dn_markadvanced.htm">Advanced CMarkup Developer</a> (ADL)</p></td></tr></table>

<p>Release 2.4 has many improvements since 2.3 in these source code classes:</p>

<p>
<li>fixes for compiling in Visual Studio 2003+ <i>*thanks Davide Zaccanti and Ghanshyam Rathi</i></li>
<li><b><code>CDataEdit</code></b> (Unicode UTF-16 or UTF-8 gigabyte text edit control) draw text caching makes far eastern text rendering much faster, text find/search improvements. See <a href="http://www.firstobject.com/dn_dataedit.htm">CDataEdit Class</a></li> 
<li><b><code>CFoalProgram</code></b> (Self-contained pcode compiler, run-time and debugger for C++ syntax based scripting) more string functions and CMarkup 11.2 additions</li> 
<li><b><code>CMarkupTreeCtrl</code></b> (Virtual tree control navigates any CMarkup document) hot keys (keyboard shortcuts), paste, multilevel customization, plus scroll-bar fixes</li> 
</p>

<p>See also:</p>

<p><a href="http://www.firstobject.com/dn_editrel.htm">Archived firstobject XML Editor 2.3 Release Notes</a><br>
<a href="http://www.firstobject.com/dn_editrel.htm">Archived firstobject XML Editor Release Notes</a></p>


]]></description>
</item>
<item>
<title>Archived CMarkup 11.2 Release Notes</title>
<link>http://www.firstobject.com/cmarkup-11.2-release-notes.htm</link>
<guid isPermaLink="false">cmarkup-11.2-release-notes.htm</guid>
<pubDate>Thu, 10 Sep 2009 10:09:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Release 11.2 Date: September 3, 2009, <a href="http://www.firstobject.com/dn_markup.htm">download</a></p>

<p>It became clear soon after <a href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">CMarkup release 11.1</a> that another release would be needed to resolve a compiler issue affecting Visual C++, the <a href="http://www.firstobject.com/fseeki64-ftelli64-in-vc++.htm">_fseeki64 and_ftelli64</a> issue. From now on, some Visual C++ developers using file mode on files over 2GB will need to define <code>MARKUP_HUGEFILE</code>.</p>

<p>Here's the list of 11.2 enhancements:</p>

<ul>
<li>Cleaned up Visual C++ compiler <a href="http://www.firstobject.com/fseeki64-ftelli64-in-vc++.htm">_fseeki64 and_ftelli64</a> issue</li>
<li>CMarkupMSXML: Added <code>Transform</code> method, just supply an XSL document "style sheet" (see <a href="http://www.firstobject.com/dn_markmsxml.htm">MSXML Wrapper CMarkupMSXML</a>)</li>
<li>CMarkupMSXML: Implemented UTF-8 conversion for when <code>MBCS</code> (ANSI) strings are NOT being used</li>
</ul>

<table width=100% cellspacing=0 cellpadding=5><tr><td valign=top bgcolor=fafae2 width=30>
<p><a href="http://www.firstobject.com/dn_markdev.htm"><img border=0 src="http://www.firstobject.com/cmarkupdev.gif" alt="CMarkup Developer License"></a></p></td><td bgcolor=fafae2>
<p>The following are only in <a href="http://www.firstobject.com/dn_markdev.htm">CMarkup Developer</a> and the <a href="http://www.firstobject.com/dn_editor.htm">free XML editor </a>&nbsp;<a href="http://www.firstobject.com/dn_foal.htm">FOAL C++ scripting</a></p>
</td></tr></table>

<p>All the 11.2 developer version enhancements only affect file mode (see <a href="http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm">write mode</a> and <a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">read mode</a>).</p> 

<ul>
<li><a class="codelink" href="http://www.firstobject.com/dn_markGetSubDoc.htm">GetSubDoc</a> in file read mode now uses smarter reallocations when extracting very large subdocuments. As CMarkup concatenates pieces of a subdocument spanning n read blocks, it won't realloc n times, more like Log(n)</li>
<li>HUGE file support >2GB is no longer automatic in Visual Studio. For huge files in Visual C++ define <code>MARKUP_HUGEFILE</code> in your Project Settings Preprocessor definitions. To test whether you have huge file support make sure <code><font color=blue>sizeof</font>(MCD_INTFILEOFFSET) == 8</code> (see <a href="http://www.firstobject.com/fseeki64-ftelli64-in-vc++.htm">_fseeki64 and_ftelli64</a>)</li>
<li>fix: <a class="codelink" href="http://www.firstobject.com/dn_markGetElemPath.htm">GetElemPath</a> and <a class="codelink" href="http://www.firstobject.com/dn_markGetParentElemPath.htm">GetParentElemPath</a> in file mode</li>
<li>fix: <a class="codelink" href="http://www.firstobject.com/dn_markSetData.htm">SetData</a> in file write mode</li>
<li>fix: <a class="codelink" href="http://www.firstobject.com/dn_markOutOfElem.htm">OutOfElem</a> in file write mode</li>
</ul>

<p>See also previous CMarkup release notes: <a href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">11.1</a>, <a href="http://www.firstobject.com/cmarkup-11.0-release-notes.htm">11.0</a>, <a href="http://www.firstobject.com/cmarkup-10.1-release-notes.htm">10.1</a>, <a href="http://www.firstobject.com/cmarkup-10.0-release-notes.htm">10.0</a>, <a href="http://www.firstobject.com/dn_markrel.htm">Archived CMarkup Release Notes</a></p>


]]></description>
</item>
</channel>
</rss>
