<rss version="2.0">
<channel>
<title>News from firstobject.com</title>
<link>http://www.firstobject.com/dn_news.xml</link>
<description>News from firstobject.com updated when articles are posted</description>
<language>en-us</language>
<lastBuildDate>Sun, 11 Oct 2009 03:00:00 GMT</lastBuildDate>
<ttl>180</ttl>
<image>
<title>News from firstobject.com</title>
<width>142</width>
<height>18</height>
<link>http://www.firstobject.com/</link>
<url>http://www.firstobject.com/firstobjectNews.gif</url>
</image>
<item>
<title>XML Editor format XML, customize treeview, and program</title>
<link>http://www.firstobject.com/xml-editor-format-xml-customize-treeview-program.htm</link>
<guid isPermaLink="false">xml-editor-format-xml-customize-treeview-program.htm</guid>
<pubDate>Sun, 11 Oct 2009 03:00:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>This screencast video demonstrates the <a href="http://www.firstobject.com/dn_editor.htm">free firstobject XML editor</a>, and how to format XML, customize the treeview, generate and step through a C++ style program.</p>

<p><object width="480" height="295"><param name="movie" value="http://www.youtube.com/v/fGrqQn2qyfw&hl=en&fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/fGrqQn2qyfw&hl=en&fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object></p>

<p>See also:</p>

<p><a href="http://www.firstobject.com/xml-splitter-script-video.htm"><img border=0 src="http://www.firstobject.com/play.gif"></a> <a href="http://www.firstobject.com/xml-splitter-script-video.htm">video of XML splitter script</a><br>
<a href="http://www.firstobject.com/format-xml-indent-align-beautify-xml.htm">Format XML, indent align beautify clean up XML</a><br>
<a href="http://www.firstobject.com/tree-customization-in-xml-editor.htm">Tree customization in the firstobject XML editor</a><br>
<a href="http://www.firstobject.com/dn_foal.htm">firstobject Access Language</a><br>
<a href="http://www.firstobject.com/counting-xml-tag-names-and-values.htm">Counting XML tag names and values with foal</a><br>
<a href="http://www.firstobject.com/convert-ansi-file-to-unicode.htm">Convert ANSI file to Unicode</a></p> 


]]></description>
</item>
<item>
<title>XSLT in the firstobject XML editor</title>
<link>http://www.firstobject.com/xslt-in-xml-editor.htm</link>
<guid isPermaLink="false">xslt-in-xml-editor.htm</guid>
<pubDate>Thu, 10 Sep 2009 11:05:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>Invoke <b>Transform F9</b> in an XML or XSL document to perform the transformation (this uses MSXML XSLT). If the stylesheet is specified in the XML file you can just press F9 and it will begin immediately. Specify the XSL file as follows:</p>

<PRE lang=xml><FONT color=#0000ff>&lt;?</FONT><FONT color=#004080>xml-stylesheet href</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>file.xsl</FONT><FONT color=#0000ff>"</FONT><FONT color=#004080> type</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>text/xsl</FONT><FONT color=#0000ff>"?&gt;</FONT></PRE>

<p>If the XSL file is not specified, you must have both the XML and XSL documents open in the editor. When you invoke the Transform, you will be prompted with a list of open documents to select the other document. So if you press F9 from the XML document it will prompt you for the stylesheet, and if you press F9 from the stylesheet, it will prompt you for the XML to transform.</p>

<p>Like MSXML Validate Alt+F7, the Transform function uses the Windows MSXML component on your machine.</p>


]]></description>
</item>
<item>
<title>firstobject XML Editor 2.4 Release Notes</title>
<link>http://www.firstobject.com/xml-editor-2.4-release-notes.htm</link>
<guid isPermaLink="false">xml-editor-2.4-release-notes.htm</guid>
<pubDate>Thu, 10 Sep 2009 10:11:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>Release 2.4 of foxe (<a href="http://www.firstobject.com/dn_editor.htm">free XML editor download</a>) adds CMarkup 11.2, MSXML-based XSLT support, and adds several fixes to the intermediate release 2.3.5.</p>

<p>
<li>Transform F9 (uses MSXML XSLT) (see <a href="http://www.firstobject.com/xslt-in-xml-editor.htm">XSLT in the firstobject XML editor</a>)</li>
<li>Right-click FOAL script debug option for Continue/Run F10</li>
<li>CMarkup <a href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">release 11.2</a> fixes, especially for file mode</li>
<li>Added <a class="codelink" href="http://www.firstobject.com/dn_markGetDocElemCount.htm">GetDocElemCount</a> to FOAL scripting</li>
<li>fix: mouse horizontal scroll bug</li>
<li>fix: 2.3.2 File Open encoding drop-down was being ignored</li>
<li>fix: 2.3.5 Undo Redo bug (select word and type new word)</li>
</p>

<p>Those are just what has changed since intermediate release 2.3.5. A lot has happened since the last full release 2.3. Here are the interim releases:</p>

<p>
<li><A href="http://www.firstobject.com/dn_editcomments.htm#20090617213000">editor 2.3.5 help search, file mode subdocs</A></li>
<li><A href="http://www.firstobject.com/dn_editcomments.htm#20090511104500">editor 2.3.4 better far east char performance</A></li>
<li><A href="http://www.firstobject.com/dn_editcomments.htm#20090417070000">editor 2.3.3 tree hot keys</A></li>
<li><A href="http://www.firstobject.com/dn_editcomments.htm#20081229223000">editor 2.3.2 encoding, msxml 6.0, tree</A></li>
<li><A href="http://www.firstobject.com/dn_editcomments.htm#20081021131100">editor Beta 2.3.1 removed excess registry setting</A></li>
</p>

<h4>MFC component improvements</h4>

<TABLE width=100% cellspacing=0 cellpadding=5><TR><TD valign=top bgcolor=f3fce2 width=30>
<P><A href="http://www.firstobject.com/dn_markadvanced.htm"><IMG border=0 src="http://www.firstobject.com/cmarkupdevadv.gif" alt="Advanced CMarkup Developer License"></A></P></TD><TD bgcolor=f3fce2>
<P>The complete MFC source code for the firstobject XML editor comes with <A href="http://www.firstobject.com/dn_markadvanced.htm">Advanced CMarkup Developer</A> (ADL)</P></TD></TR></TABLE>

<p>Release 2.4 has many improvements since 2.3 in these source code classes:</p>

<p>
<li>fixes for compiling in Visual Studio 2003+ <i>*thanks Davide Zaccanti and Ghanshyam Rathi</i></li>
<li><b><code>CDataEdit</code></b> (Unicode UTF-16 or UTF-8 gigabyte text edit control) draw text caching makes far eastern text rendering much faster, text find/search improvements. See <a href="http://www.firstobject.com/dn_dataedit.htm">CDataEdit Class</a></li> 
<li><b><code>CFoalProgram</code></b> (Self-contained pcode compiler, run-time and debugger for C++ syntax based scripting) more string functions and CMarkup 11.2 additions</li> 
<li><b><code>CMarkupTreeCtrl</code></b> (Virtual tree control navigates any CMarkup document) hot keys (keyboard shortcuts), paste, multilevel customization, plus scroll-bar fixes</li> 
</p>

<p>See also:</p>

<p><a href="http://www.firstobject.com/dn_editrel.htm">Archived firstobject XML Editor 2.3 Release Notes</a><br>
<a href="http://www.firstobject.com/dn_editrel.htm">Archived firstobject XML Editor Release Notes</a></p>


]]></description>
</item>
<item>
<title>CMarkup 11.2 Release Notes</title>
<link>http://www.firstobject.com/cmarkup-11.2-release-notes.htm</link>
<guid isPermaLink="false">cmarkup-11.2-release-notes.htm</guid>
<pubDate>Thu, 10 Sep 2009 10:09:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Release 11.2 Date: September 3, 2009, <a href="http://www.firstobject.com/dn_markup.htm">download</a></p>

<p>It became clear soon after <A href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">CMarkup release 11.1</A> that another release would be needed to resolve a compiler issue affecting Visual C++, the <a href="http://www.firstobject.com/fseeki64-ftelli64-in-vc++.htm">_fseeki64 and_ftelli64</a> issue. From now on, some Visual C++ developers using file mode on files over 2GB will need to define <code>MARKUP_HUGEFILE</code>.</p>

<p>Here's the list of 11.2 enhancements:</p>

<ul>
<li>Cleaned up Visual C++ compiler <a href="http://www.firstobject.com/fseeki64-ftelli64-in-vc++.htm">_fseeki64 and_ftelli64</a> issue</li>
<li>CMarkupMSXML: Added <code>Transform</code> method, just supply an XSL document "style sheet" (see <a href="http://www.firstobject.com/dn_markmsxml.htm">MSXML Wrapper CMarkupMSXML</a>)</li>
<li>CMarkupMSXML: Implemented UTF-8 conversion for when <code>MBCS</code> (ANSI) strings are NOT being used</li>
</ul>

<TABLE width=100% cellspacing=0 cellpadding=5><TR><TD valign=top bgcolor=fafae2 width=30>
<P><A href="http://www.firstobject.com/dn_markdev.htm"><IMG border=0 src="http://www.firstobject.com/cmarkupdev.gif" alt="CMarkup Developer License"></A></P></TD><TD bgcolor=fafae2>
<P>The following are only in <a href="http://www.firstobject.com/dn_markdev.htm">CMarkup Developer</a> and the <a href="http://www.firstobject.com/dn_editor.htm">free XML editor </a>&nbsp;<a href="http://www.firstobject.com/dn_foal.htm">FOAL C++ scripting</a></P>
</TD></TR></TABLE>

<p>All the 11.2 developer version enhancements only affect file mode (see <a href="http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm">write mode</a> and <a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">read mode</a>).</p> 

<ul>
<li><A class="codelink" href="http://www.firstobject.com/dn_markGetSubDoc.htm">GetSubDoc</A> in file read mode now uses smarter reallocations when extracting very large subdocuments. As CMarkup concatenates pieces of a subdocument spanning n read blocks, it won't realloc n times, more like Log(n)</li>
<li>HUGE file support >2GB is no longer automatic in Visual Studio. For huge files in Visual C++ define <code>MARKUP_HUGEFILE</code> in your Project Settings Preprocessor definitions. To test whether you have huge file support make sure <code><FONT color=blue>sizeof</FONT>(MCD_INTFILEOFFSET) == 8</code> (see <a href="http://www.firstobject.com/fseeki64-ftelli64-in-vc++.htm">_fseeki64 and_ftelli64</a>)</li>
<li>fix: <a class="codelink" href="http://www.firstobject.com/dn_markGetElemPath.htm">GetElemPath</a> and <a class="codelink" href="http://www.firstobject.com/dn_markGetParentElemPath.htm">GetParentElemPath</a> in file mode</li>
<li>fix: <a class="codelink" href="http://www.firstobject.com/dn_markSetData.htm">SetData</a> in file write mode</li>
<li>fix: <a class="codelink" href="http://www.firstobject.com/dn_markOutOfElem.htm">OutOfElem</a> in file write mode</li>
</ul>

<p>See also previous CMarkup release notes: <A href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">11.1</A>, <A href="http://www.firstobject.com/cmarkup-11.0-release-notes.htm">11.0</A>, <A href="http://www.firstobject.com/cmarkup-10.1-release-notes.htm">10.1</A>, <A href="http://www.firstobject.com/cmarkup-10.0-release-notes.htm">10.0</A>, <A href="http://www.firstobject.com/dn_markrel.htm">Archived CMarkup Release Notes</A></p>


]]></description>
</item>
<item>
<title>display wchar_t string and wstring in gdb</title>
<link>http://www.firstobject.com/wchar_t-gdb.htm</link>
<guid isPermaLink="false">wchar_t-gdb.htm</guid>
<pubDate>Thu, 10 Sep 2009 10:09:00 GMT</pubDate>
<category>C++ Articles</category>
<description><![CDATA[

<p>When you print a <code>wchar_t</code> string or <code>std::wstring</code> while debugging C or C++ in gdb you don't get to view the content of the string. At first I painstakingly casted the pointers at each offset to check the values, but then I decided there had to be a better way. I didn't find anything as simple as turning on an option, but you can script it in gdb as follows:</p>

<pre>define wc_print
echo "
set $c = (wchar_t*)$arg0
while ( *$c )
  if ( *$c > 0x7f )
    printf "[%x]", *$c
  else
    printf "%c", *$c
  end
  set $c++
end
echo "\n
end</pre>

<p>Then you can just type "wc &lt;wide_string_variable_name&gt;" (the <code>wc</code> shortform for <code>wc_print</code> should work as long as there are no other gdb commands starting with wc). The non-ASCII characters will be displayed as hex in square brackets.</p>

<p>This works for a <code>wchar_t</code> pointer, and even an STL <code>wstring</code>. I am guessing you don't need to call the <code>c_str()</code> member of the <code>wstring</code> because the first (and only? see below) data member of the <code>wstring</code> (<code>._M_dataplus._M_p</code>) is the <code>wchar_t</code> pointer.</p>

<h2>printf %ls not reliable</h2>

<p>At first I was using <code>call printf</code> with <code>%ls</code> but it sometimes exhibited a strange failure where nothing at all was output. It is also unhelpful for non-ASCII in my experience and will have undetermined behavior when the underlying <code>wcstomb</code> conversion fails. But FWIW, this is what I used (I had to cast the <code>printf</code> to <code>(void)</code> or gdb would complain "no return type information available"):</code></p></p>

<pre>define wc_print
call (void)printf("\"%ls\"\n",$arg0)
end</pre>

<p>Some googling uncovered a developer who described similar symptoms of a <a href="http://www.cygwin.com/ml/cygwin/2009-05/msg00466.html">bug in printf %ls</a> on cygwin (I was using OS X 10.5.7 with gdb 6.3.50). Maybe there are platforms where this is reliable, but it wasn't for me.</p>

<h2>What happens with normal gdb print</h2>

<p>When you normally print a <code>std::wstring</code> or <code>wchar_t</code> string you get output like this:</p>

<pre>$3 = {
  static npos = 4294967295,
  _M_dataplus = {
    <std::allocator<wchar_t>> = {
	  <__gnu_cxx::new_allocator<wchar_t>> = {<No data fields>}, <No data fields> },
	members of std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >::_Alloc_hider:
    _M_p = 0x808e0c
  }
}</pre>

<pre>$4 = (const wchar_t *) 0x44920</pre>

<p>It doesn't show the text content like it does with a <code><font color=blue>char</font>*</code> like this:</p>

<pre>$2 = 0x3e090 "Hello"</pre>

<p>So instead of "p &lt;wide_string_variable_name&gt;" you can now just use "wc &lt;wide_string_variable_name&gt;".</p>

<h2>Defining a gdb command</h2>

<p>By defining the command, it is very convenient to display any wide string. You can put the define in a file and source the file when you need it (using the gdb source command) or put it in your .gdbinit file. If your .gdbinit file is in your work folder, you will have the command every time you start gdb there.</p>

<p>You can also add the following after your command definition to document it:</p>

<pre>document wc_print
wc_print &lt;wide_string_variable_name&gt;
Display &lt;wide_string_variable_name&gt; which is a wchar_t* or wstring.
end</pre>


]]></description>
</item>
<item>
<title>Huge file _fseeki64 _ftelli64 in Visual C++</title>
<link>http://www.firstobject.com/fseeki64-ftelli64-in-vc++.htm</link>
<guid isPermaLink="false">fseeki64-ftelli64-in-vc++.htm</guid>
<pubDate>Thu, 10 Sep 2009 10:08:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Writing cross-platform <i>huge</i> file I/O code is tricky because the 64-bit offset versions of <code>fseek</code> and <code>ftell</code> are not standard across compilers and platforms. This article documents the CMarkup experience with this, related discoveries made along the way, and it may be useful to anyone dealing with this issue.</p>

<h2>32-bit and 64-bit file offsets</h2>

<p>The limit of signed 32-bit integer offsets in <code>ftell</code> and <code>fseek</code> is 2^31-1 = 2147483647 which is around 2GB. For example, a common cross-platform way of getting file size is <code>fseek</code> with <code>SEEK_END</code> and then <code>ftell</code>, but <code>ftell</code> cannot return a number higher than the limit of its return type.</p>

<p>The 64-bit versions of these functions deal in offsets of billions of gigabytes (8 exabytes) which should be enough for single file sizes in the next decade or two ;).</p>

<p>There are no 64-bit offset versions of <code>fread</code> and <code>fwrite</code> functions. This is because they do not deal explicitly with file offsets, though they do operate relative to the current file pointer. The underlying file pointer can generally handle the huge file, so in theory you can read multiple 2GB blocks using <code>fread</code> even if you don't have a 64-bit <code>ftell</code> function available to query the current offset.</p>

<p>So even as huge file support was added to the file I/O functions in C/C++ libraries by the early 90s, there was no graceful way to promote the integer types of offset variables used in existing programs. The <code>fread</code> and <code>fwrite</code> functions were upgraded under the covers, while new versions of <code>ftell</code> and <code>fseek</code> were added because they dealt explicitly with the offset.</p>

<p>I've seen <code><font color=blue>#ifdef</font> WIN64</code> used in conjunction with <code>_fseeki64</code>. I am not sure why they were doing that, but don't be confused. A 64-bit operating system is <b>not</b> required for 64-bit file offsets; you can usually use 64-bit offsets in a 32-bit operating system.</p>

<h2>Huge files and CMarkup</h2>

<p>The 64-bit offset versions of <code>fseek</code> and <code>ftell</code> are used to support huge files with the CMarkup <A href="http://www.firstobject.com/cmarkup-11.0-release-notes.htm">release 11.0</A> file mode methods. The file mode methods give read and write access to files without loading the entire document into memory. File mode does not require 64-bit offsets, but 64-bit offsets are needed if you are dealing with files over 2GB.</p>

<h2>off_t</h2>

<p>Most UNIX flavor compilers like gcc are standardized on the <code>ftello</code> and <code>fseeko</code> which use the <code>off_t</code> integer type that depends on compiler setup. The functions resolve to the old <code>fseek</code>/<code>ftell</code> or huge <code>fseeko64</code>/<code>ftello64</code> in concert with the <code>_FILE_OFFSET_BITS</code> and <code>_LARGEFILE_SOURCE</code> macro defines.</p>

<p>Unfortunately VC++ doesn't use this system.</p>

<h2>CMarkup in Visual Studio</h2>

<p>Visual C++ provides its huge versions, <code>_fseeki64</code> and <code>_ftelli64</code> inconsistently. There is no clean way to always know whether the 64-bit versions are available and to automatically compile accordingly.</p>

<p>In CMarkup release 11.0 the 64-bit functions are used by default in Visual Studio 2005. This is not correct for target platforms where the 64-bit functions are deliberately excluded such as for Windows CE.</p>

<p>Since the 64-bit functions are covertly distributed with Visual Studio 6.0 (see below), in CMarkup <A href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">release 11.1</A> I declared prototypes for the 64-bit functions by default in Visual Studio 6.0. The caused more problems because the functions are not available in some build configurations of VC++ 6.0.<p>

<p>In CMarkup <A href="http://www.firstobject.com/cmarkup-11.2-release-notes.htm">release 11.2</A> with any version of Visual Studio you will need to define <code>MARKUP_HUGEFILE</code> in the project definitions if you want huge file access. This will eliminate these linker errors for first time users, while putting the burden on those who need huge file access to specify the define.</p>

<p>One alternative would be to use the Win32 File APIs and the <code>SetFilePointerEx</code> available on Win2K+, but this would greatly increase the code differences between the different platforms.</p>

<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> 11.1 Issue: linking fseeki64 and ftelli64</p></div>
<div class=commentposted><p>David 17-Jun-2009</p></div>
<div class=commentcontent>
<p>I got an error in my application which is a windows mobile program, developing in VS 2008. I created the application in the following steps:<p>
<p>a) Create a smartdevice project: New project -> smart device ->Win32 Smart Device project<br>
b) Add markup files: copy the two files into the project directory, then add them into the project, and select the "cmarkup.cpp", set it properties "Not Using Precompiled Headers"<br>
c) Add some code in the testmarkup.cpp: <code><FONT color=blue>#include</FONT> "Markup.h"</code> ... <code>CMarkup xml;	xml.Load(_T("UserInfo.xml"));</code><br>
d) Compile the project, then I got the following error [configuration Debug Windows Mobile 5.0 Pocket PC SDK (ARMV4I)]:</p>
<pre>...
WINVER not defined. Defaulting to 0x0400,
  which is appropriate for all supported Windows CE versions
...
.\Markup.cpp(1498) : error C3861: '_fseeki64': identifier not found
.\Markup.cpp(1499) : error C3861: '_ftelli64': identifier not found
...</pre>
</div></div>

<p>CMarkup started using fseeki64 and ftelli64 in release 11, but they are optional. In the next release I will not use them unless you set a define. To turn them off do something like this:</p>

<p>Old line 223:</p>
<pre><FONT color=blue>#elif</FONT> _MSC_VER >= 1000 <font color=green>// VC++</font></pre>
<p>New line 223:</p>
<pre><FONT color=blue>#elif</FONT> _MSC_VER > 4000 <font color=green>// never</font></pre>


<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> 11.1 Issue: linking fseeki64 and ftelli64</p></div>
<div class=commentposted><p>Robin Hilliard 12-Jun-2009</p></div>
<div class=commentcontent>
<p>I'm still on VC++ 6.0 (don't ask) and the functions <code>_fseeki64</code> and <code>_ftelli64</code> don't exist. To fix this, you just need to bump the compiler
version directive as follows:</p>
<p>Old line 223:<br>
<code><FONT color=blue>#elif</FONT> _MSC_VER >= 1000 <font color=green>// VC++</font></code></p>
<p>New line 223:<br>
<code><FONT color=blue>#elif</FONT> _MSC_VER > 1200 <font color=green>// > VC++ 6.0?</font></code></p>
</div></div>

<p>In <a href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">CMarkup release 11.1</a> I implemented a workaround to offer huge file (>2GB) access by default in Visual C++ 6.0 and Visual Studio .NET versions before VC++ 2005. Huge file access is only useful for the <a href="http://www.firstobject.com/dn_markdev.htm">developer version</a> file mode (see <A class="codelink" href="http://www.firstobject.com/dn_markOpen.htm">Open</A>) methods.</p>

<p>However, I did not realize that it depended on a project setting for these functions to be available in the VC++ libraries you link with. If in your project settings you have the default Microsoft Foundation Classes setting "Use MFC is a Shared DLL" you will get the following problem when linking CMarkup release 11.1:</p>

<pre>Linking...
Markup.obj : error LNK2001: unresolved external symbol __ftelli64
Markup.obj : error LNK2001: unresolved external symbol __fseeki64</pre>

<p>If you are able to change your Microsoft Foundation Classes project setting to "Use MFC in a Static Library" that is one way to alleviate the linker error and keep huge file access. Otherwise, you can change the compiler version directive as shown above and you cannot use file mode with files over 2GB.</p>

<pre>Microsoft Visual Studio\VC98\CRT\SRC\FSEEKI64.C
Microsoft Visual Studio\VC98\CRT\SRC\FTELLI64.C</pre>

<pre>Microsoft Visual Studio\VC98\Lib\LIBCMT.LIB</pre>

<pre>dumpbin /exports libcmt.lib</pre>

<p>does not show it but the following does:</p>

<pre>dumpbin /all libcmt.lib</pre>


<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> Linking Problem of CMarkup V11.1</p></div>
<div class=commentposted><p>Kang Yiqi 27-Jul-2009</p></div>
<div class=commentcontent>
<p>I'm using CMarkup (V11.1) class in my laboratory project (Windows XP & Visual C++ 6.0) for parsing xml documents. I've download Markup.h &amp; Markup.cpp from www.firstobject.com and added them into my VC++ project.</p>
<p>There's a problem in Markup.h, line 223: "<code><font color=blue>#elif</font> _MSC_VER >= 1000</code>". As the version of VC++ 6.0 is 1200, <code>MCD_FSEEK</code> and <code>MCD_FTELL</code> are defined as <code>_fseeki64</code> and <code>_ftelli64</code>. This causes 2 linking errors, e.g. "unresolved external symbol __ftelli64"</p>
<p>However, if I choose to "Use MFC in a Static Library", just the same as the sample project does, it's okay. But the size of execution file seems to be too big. So I changed the line 223 like this: "<code><font color=blue>#elif</font> _MSC_VER > 1200</code>", and the problem is solved. I also try version 11.0 of CMarkup and there's no such problem, because it define <code>MCD_FSEEK</code> as <code>_fseeki64</code> only when <code>_MSC_VER</code> is bigger than 1400 (VC++ 2005).</p>
<p>I would suggest that for VC++ 6.0, it's better to define <code>MCD_FSEEK</code> and <code>MCD_FTELL</code> as <code>fseek</code> and <code>ftell</code>, rather than <code>_fseeki64</code> and <code>_ftelli64</code></p>
</div></div>

<p>Thank you for the excellent feedback and research. <A href="http://www.firstobject.com/cmarkup-11.2-release-notes.htm">Release 11.2</A> is fixed according to your suggestion.</p>


]]></description>
</item>
<item>
<title>Split XML with XML editor script</title>
<link>http://www.firstobject.com/split-xml.htm</link>
<guid isPermaLink="false">split-xml.htm</guid>
<pubDate>Thu, 02 Jul 2009 02:00:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>On another forum a user asked how to "Split XML and output to different files" and did not get an answer, just several confusing responses about versions of XSLT. This is how simple the answer is in the free firstobject XML editor:</p>

<pre>split_XML()
{
  CMarkup input;
  input.Load( "input.xml" );
  while ( input.FindElem("//npc") )
    WriteTextFile( "npc"+input.GetAttrib("id")+".xml", input.GetSubDoc() );
}</pre>

<p>This script will read input.xml and create an npc[ID].xml file for each npc subdocument. This was the question:</p>

<table><tr><th align=left>Starting with</th><th>... </th><th align=left>Desired result</th></tr><tr><td valign=top>

<p>"input.xml"</p>

<PRE lang=xml><FONT color=#0000ff>&lt;root&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;npc</FONT><FONT color=#be3232> id</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>1</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>1</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>2</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>3</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;/npc&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;npc</FONT><FONT color=#be3232> id</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>2</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>3</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>4</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>5</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;/npc&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;npc</FONT><FONT color=#be3232> id</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>3</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>4</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>5</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>6</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;/npc&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;/root&gt;</FONT></PRE>

</td><td>&nbsp;</td><td valign=top>

<p>"npc1.xml"</p>

<PRE lang=xml><FONT color=#0000ff>&lt;npc</FONT><FONT color=#be3232> id</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>1</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>1</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>2</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>3</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;/npc&gt;</FONT></PRE>

<p>"npc2.xml"</p>

<PRE lang=xml><FONT color=#0000ff>&lt;npc</FONT><FONT color=#be3232> id</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>2</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>3</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>4</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>5</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;/npc&gt;</FONT></PRE>

<p>"npc3.xml"</p>

<PRE lang=xml><FONT color=#0000ff>&lt;npc</FONT><FONT color=#be3232> id</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>3</FONT><FONT color=#0000ff>"&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>4</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>5</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;p</FONT><FONT color=#be3232> pid</FONT><FONT color=#0000ff>="</FONT><FONT style='color:black;font-weight:bold;'>6</FONT><FONT color=#0000ff>"/&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;/npc&gt;</FONT></PRE>

</td></tr></table>

<p>The simplicity of a short bit of CMarkup code is yet another reason to <a href="http://www.firstobject.com/dn_whynotxslt.htm">avoid XSLT</a>. Here is a similar question asked here:</p>

<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> xml splitter</p></div>
<div class=commentposted><p>Dita Ciulacu 01-Jul-2009</p></div>
<div class=commentcontent>
<p>I am searching for a xml splitter to generate the file name using values from a child field. I can't make [<a href="http://www.firstobject.com/xml-splitter-script-video.htm">your script</a>] work for my specific file name. Is there a way to have the file name this way: <code>xmlOutput.Open( "test" + "_" + [Child value from REFERRAL_ID]  + "_"+ nFileCount + ".xml", MDF_WRITEFILE );</code> My XML is:</p>
<PRE lang=xml><FONT color=#0000ff>&lt;REFERRAL_DISCHARGE&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;FILE_VERSION&gt;</FONT><FONT style='color:black;font-weight:bold;'>1.0</FONT><FONT color=#0000ff>&lt;/FILE_VERSION&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;REFERRAL_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>9999</FONT><FONT color=#0000ff>&lt;/REFERRAL_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;ORGANISATION_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>A12345-6</FONT><FONT color=#0000ff>&lt;/ORGANISATION_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;ORGANISATION_TYPE&gt;</FONT><FONT style='color:black;font-weight:bold;'>100</FONT><FONT color=#0000ff>&lt;/ORGANISATION_TYPE&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;EXTRACT_FROM_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>2009-06-01T00:00:00</FONT><FONT color=#0000ff>&lt;/EXTRACT_FROM_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;EXTRACTED_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>2009-06-30T00:30:00</FONT><FONT color=#0000ff>&lt;/EXTRACTED_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;TEAM_CODE&gt;</FONT><FONT style='color:black;font-weight:bold;'>1111</FONT><FONT color=#0000ff>&lt;/TEAM_CODE&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;EVENT_HCU_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>AAA1234</FONT><FONT color=#0000ff>&lt;/EVENT_HCU_ID&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;SEX&gt;</FONT><FONT style='color:black;font-weight:bold;'>M</FONT><FONT color=#0000ff>&lt;/SEX&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;DATE_OF_BIRTH&gt;</FONT><FONT style='color:black;font-weight:bold;'>1900-05-05</FONT><FONT color=#0000ff>&lt;/DATE_OF_BIRTH&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;REFERRAL_FROM&gt;</FONT><FONT style='color:black;font-weight:bold;'>UN</FONT><FONT color=#0000ff>&lt;/REFERRAL_FROM&gt;</FONT><FONT style='color:black;font-weight:bold;'>
  </FONT><FONT color=#0000ff>&lt;START_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>2008-12-24T00:00:00</FONT><FONT color=#0000ff>&lt;/START_DATE_TIME&gt;</FONT><FONT style='color:black;font-weight:bold;'>
</FONT><FONT color=#0000ff>&lt;/REFERRAL_DISCHARGE&gt;</FONT><FONT style='color:black;font-weight:bold;'></FONT></PRE>
<p>The parent is REFERRAL_DISCHARGE. I need the file name exactly how you have it plus the individual value from REFERRAL_ID to make it easy to link to the data included. We are a not-for-profit organization and we have to report to the Ministry of Health and our data is to be packed as individual XML files. We are not dealing with huge files (this one is only 316kb).</p>
</div></div>

<p>Since the input file is under 10MB it makes sense to <A class="codelink" href="http://www.firstobject.com/dn_markLoad.htm">Load</A> it all at once rather than using the <A class="codelink" href="http://www.firstobject.com/dn_markOpen.htm">Open</A> method for read mode. Also, it is easier to pick out the data for naming the output file using an in-memory XML document than being restricted by forward-only file read mode. Here's the script to divide the XML into files with one REFERRAL_DISCHARGE each:</p>

<pre>split()
{
  CMarkup xmlInput;
  xmlInput.Load( "split.xml" );
  <FONT color=blue>int</FONT> nFileCount = 0;
  <FONT color=blue>while</FONT> ( xmlInput.FindElem("//REFERRAL_DISCHARGE") )
  {
    ++nFileCount;
    xmlInput.FindChildElem( "REFERRAL_ID" );
    <FONT color=blue>str</FONT> sID = xmlInput.GetChildData();
    <FONT color=blue>str</FONT> sFilename = "test" + "_" + sID + "_"+ nFileCount + ".xml";
    WriteTextFile( sFilename, xmlInput.GetSubDoc() );
  }
  <FONT color=blue>return</FONT> nFileCount;
}</pre>

<p>In this simple REFERRAL_DISCHARGE subdocument the REFERRAL_ID is a child element and we can find it using the <A class="codelink" href="http://www.firstobject.com/dn_markFindChildElem.htm">FindChildElem</A> method while keeping the main position at the REFERRAL_DISCHARGE element. This allows us to still retrieve the whole REFERRAL_DISCHARGE subdocument with <A class="codelink" href="http://www.firstobject.com/dn_markGetSubDoc.htm">GetSubDoc</A> after building the filename.</p>

<p>If the source is a large XML file, and loading it all into memory is not feasable, it is still pretty simple:</p>

<pre>split()
{
  CMarkup xmlInput, xmlReferralDischarge;
  xmlInput.Open( "big_split.xml", MDF_READFILE );
  <FONT color=blue>int</FONT> nFileCount = 0;
  <FONT color=blue>while</FONT> ( xmlInput.FindElem("//REFERRAL_DISCHARGE") )
  {
    ++nFileCount;
    xmlReferralDischarge.SetDoc( xmlInput.GetSubDoc() );
    xmlReferralDischarge.FindChildElem( "REFERRAL_ID" );
    <FONT color=blue>str</FONT> sID = xmlReferralDischarge.GetChildData();
    <FONT color=blue>str</FONT> sFilename = "test" + "_" + sID + "_"+ nFileCount + ".xml";
    xmlReferralDischarge.Save( sFilename );
  }
  xmlInput.Close();
  <FONT color=blue>return</FONT> nFileCount;
}</pre>

<p>Since each individual REFERRAL_DISCHARGE subdocument is small, we populate <code>xmlReferralDischarge</code> in-memory using <A class="codelink" href="http://www.firstobject.com/dn_markSetDoc.htm">SetDoc</A>, grab the ID out of it, and then use the <A class="codelink" href="http://www.firstobject.com/dn_markSave.htm">Save</A> method to write the file.</p> 

<p>And it is not too hard to put multiple subdocuments into each output file and to split XML into very large files from an extremely large file. In <a href="http://www.firstobject.com/split-xml-file-into-smaller-pieces.htm">Split XML file into smaller pieces</a> and the <a href="http://www.firstobject.com/xml-splitter-script-video.htm">video of an XML splitter script</a> I put multiple subdocuments into each output file and use "<a href="http://www.firstobject.com/dn_markOpen.htm">file mode</a>" to keep a low footprint.</p>


]]></description>
</item>
<item>
<title>Video of XML splitter script for splitting XML files</title>
<link>http://www.firstobject.com/xml-splitter-script-video.htm</link>
<guid isPermaLink="false">xml-splitter-script-video.htm</guid>
<pubDate>Thu, 18 Jun 2009 22:15:00 GMT</pubDate>
<category>XML Editor Articles</category>
<description><![CDATA[

<p>This screencast video demonstrates splitting a large XML file with the <a href="http://www.firstobject.com/dn_editor.htm">free firstobject XML editor</a> (be sure to get <a href="http://www.firstobject.com/dn_editcomments.htm#20090617213000">release 2.3.5</a> or later), much the same as I explained how to do with CMarkup in the article <a href="http://www.firstobject.com/split-xml-file-into-smaller-pieces.htm">Split XML file into smaller pieces</a>. Below the video you will find a copy of the script used in the video.</p>

<p><object width="480" height="295"><param name="movie" value="http://www.youtube.com/v/9ANBa9i5LhM&hl=en&fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/9ANBa9i5LhM&hl=en&fs=1" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object></p>

<p>Here is the script shown in the video. I've highlighted the following things you will need to change to fit your own purposes:</p>

<p><li><code><FONT style='background:#ffff00;'>50MB.xml</FONT></code> - open your own input filename</li>
<li><code><FONT style='background:#ffff00;'>//ACT</FONT></code> - loop through the elements in the input file that you need to transfer</li>
<li><code><FONT style='background:#ffff00;'>piece</FONT></code> - name your output files</li>
<li><FONT style='background:#ffff00;'>root</FONT> - name your root element and any container elements needed in the output file</li>
<li><FONT style='background:#ffff00;'>5</FONT> - set the maximum of those subdocuments per output file</li></p>

<pre>split()
{
  <FONT color=blue>CMarkup</FONT> xmlInput, xmlOutput;
  xmlInput.Open( "<FONT style='background:#ffff00;'>50MB.xml</FONT>", MDF_READFILE );
  <FONT color=blue>int</FONT> nObjectCount = 0, nFileCount = 0;
  <FONT color=blue>while</FONT> ( xmlInput.FindElem("<FONT style='background:#ffff00;'>//ACT</FONT>") )
  {
    <FONT color=blue>if</FONT> ( nObjectCount == 0 )
    {
      ++nFileCount;
      xmlOutput.Open( "<FONT style='background:#ffff00;'>piece</FONT>" + nFileCount + ".xml", MDF_WRITEFILE );
      xmlOutput.AddElem( "<FONT style='background:#ffff00;'>root</FONT>" );
      xmlOutput.IntoElem();
    }
    xmlOutput.AddSubDoc( xmlInput.GetSubDoc() );
    ++nObjectCount;
    <FONT color=blue>if</FONT> ( nObjectCount == <FONT style='background:#ffff00;'>5</FONT> )
    {
      xmlOutput.Close();
      nObjectCount = 0;
    }
  }
  <FONT color=blue>if</FONT> ( nObjectCount )
    xmlOutput.Close();
  xmlInput.Close();
  <FONT color=blue>return</FONT> nFileCount;
}</pre>

<p>There is also another article about <a href="http://www.firstobject.com/split-xml.htm">how to split XML with the editor</a> and naming the output files based on information contained in the XML.</p>

<p>See also:</p>

<p><a href="http://www.firstobject.com/format-xml-indent-align-beautify-xml.htm">Format XML, indent align beautify clean up XML</a><br>
<a href="http://www.firstobject.com/tree-customization-in-xml-editor.htm">Tree customization in the firstobject XML editor</a><br>
<a href="http://www.firstobject.com/dn_foal.htm">firstobject Access Language</a><br>
<a href="http://www.firstobject.com/counting-xml-tag-names-and-values.htm">Counting XML tag names and values with foal</a><br>
<a href="http://www.firstobject.com/convert-ansi-file-to-unicode.htm">Convert ANSI file to Unicode</a></p> 


]]></description>
</item>
<item>
<title>Split XML file into smaller pieces</title>
<link>http://www.firstobject.com/split-xml-file-into-smaller-pieces.htm</link>
<guid isPermaLink="false">split-xml-file-into-smaller-pieces.htm</guid>
<pubDate>Sun, 07 Jun 2009 08:08:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>To split an XML file into smaller pieces you read through the input file, creating output files and transferring subdocuments as you go. Whether in C++ or scripting in FOAL, CMarkup makes it simple. For large XML files, use CMarkup file read mode shown below to read the large XML file with very little memory while extracting subdocuments. See this <a href="http://www.firstobject.com/xml-splitter-script-video.htm">video of an XML splitter script</a> to watch the process in action. Also check out this other article with even <a href="http://www.firstobject.com/split-xml.htm">simpler techniques to split XML</a>.</p>

<p>The question when splitting XML is where do you want to split it? There could be a logical place to divide the XML like into the subdocuments immediately under the root. Or you might simply have a size limit and want to divide your large XML file with ten million objects into files with one million each.</p>

<p><img src="http://www.firstobject.com/split.gif" alt="split XML file with ten million objects into 10 XML files with one million objects each"></p>

<p>Below is C++ XML splitter code to split an XML file containing N million objects into N files containing 1 million objects. Here is the idea:</p>

<p><li>Use two CMarkup objects, one for the input file to be split, and one for the output files</li>
<li>Open the big input file to begin looping through all the objects in it</li>
<li>Open an output file using the output file count to form the filename</li>
<li>Transfer object subdocuments from input file to output file until object count maximum</li>
<li>Close the output file, reset the object count, increment the output file count</li>
<li>If not at the end of the input file, open a new output file as above and continue</li>
<li>At the end of the input file, exit loop, close output file (if left open), close input file</li></p>

<pre><font color=green>// Split XML</font>
CMarkup xmlInput, xmlOutput;
xmlInput.Open( "please_split.xml", MDF_READFILE );
<FONT color=blue>int</FONT> nObjectCount = 0, nFileCount = 0;
<FONT color=blue>while</FONT> ( xmlInput.FindElem("//object") )
{
  <FONT color=blue>if</FONT> ( nObjectCount == 0 )
  {
    ++nFileCount;
    xmlOutput.Open( "piece" + StrFromInt(nFileCount) + ".xml", MDF_WRITEFILE );
    xmlOutput.AddElem( "root" );
    xmlOutput.IntoElem();
  }
  xmlOutput.AddSubDoc( xmlInput.GetSubDoc() );
  ++nObjectCount;
  <FONT color=blue>if</FONT> ( nObjectCount == 1000000 )
  {
    xmlOutput.Close();
    nObjectCount = 0;
  }
}
<FONT color=blue>if</FONT> ( nObjectCount )
  xmlOutput.Close();
xmlInput.Close();</pre>

<p>You could also use size rather than object count as the basis of where to split the XML document. To do this, keep a tally of the subdocument sizes until a threshhold is reached. The subdocument transfer shown above occurs in <code>xmlOutput.AddSubDoc( xmlInput.GetSubDoc() )</code>. You can instead do it in two steps and track the size like this:</p>

<pre>MCD_STR sObject = xmlInput.GetSubDoc();
nOutputLength += MCD_STRLENGTH(sObject);
xmlOutput.AddSubDoc( sObject );</pre>

<p>Note though that the <code>nOutputLength</code> is not the same as the output file byte size if your in-memory encoding is different from your file encoding or your encoding is 2-byte based UTF-16.</p>

<p>

]]></description>
</item>
<item>
<title>Archived CMarkup 11.1 Release Notes</title>
<link>http://www.firstobject.com/cmarkup-11.1-release-notes.htm</link>
<guid isPermaLink="false">cmarkup-11.1-release-notes.htm</guid>
<pubDate>Sun, 07 Jun 2009 08:00:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Release 11.1 Date: June 7, 2009, <a href="http://www.firstobject.com/dn_markup.htm">download</a></p>

<p>After "file mode" was introduced in release 11.0 (developer version) it became clear we needed a convenient way to dynamically handle subdocuments in memory and handle subdocuments or pieces of XML in memory. This release fills out those essential features with new support for file mode in <code>AddSubDoc</code> and <code>GetSubDoc</code>. To learn more about file mode in CMarkup see the <a href="http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm">C++ XML writer</a> and <a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">C++ XML reader</a>.</p>

<p>Another great developer version feature is the new attribute support in <code>FindGetData</code> and <code>FindSetData</code> extending those powerful functions to set and get attributes in one step. See <a href="http://www.firstobject.com/dn_markstruct.htm">Dynamic Structure Documents</a>.

<p>Here's the list of 11.1 enhancements:</p>

<ul>
<li>To deflate a CMarkup object's memory usage, <code>SetDoc(NULL)</code> is now different from <code>SetDoc("")</code> and <code>SetDoc(str)</code>; passing <code>NULL</code> frees up most of the index memory and ensures minimum allocation in the string object. See <A class="codelink" href="http://www.firstobject.com/dn_markSetDoc.htm">SetDoc</A></li>
</ul>

<TABLE width=100% cellspacing=0 cellpadding=5><TR><TD valign=top bgcolor=fafae2 width=30>
<P><A href="http://www.firstobject.com/dn_markdev.htm"><IMG border=0 src="http://www.firstobject.com/cmarkupdev.gif" alt="CMarkup Developer License"></A></P></TD><TD bgcolor=fafae2>
<P>The following are only in <a href="http://www.firstobject.com/dn_markdev.htm">CMarkup Developer</a> and the <a href="http://www.firstobject.com/dn_editor.htm">free XML editor </a>&nbsp;<a href="http://www.firstobject.com/dn_foal.htm">FOAL C++ scripting</a></P>
</TD></TR></TABLE>

<ul>
<li>fix: <a href="http://www.firstobject.com/dn_markknown.htm#20090507120000">11.0 Bug: file read and write modes</a> <i>*thanks Dave</i></li>
<li>HUGE file support >2GB is now automatic in Visual C++ before Visual C++ 2005. To test whether you have huge file support make sure <code><FONT color=blue>sizeof</FONT>(MCD_INTFILEOFFSET) == 8</code></li>
<li><A class="codelink" href="http://www.firstobject.com/dn_markGetSubDoc.htm">GetSubDoc</A> is now supported in read file mode and <A class="codelink" href="http://www.firstobject.com/dn_markAddSubDoc.htm">AddSubDoc</A> is now supported in write file mode; these support important in-memory subdocument manipulation</li>
<li><A class="codelink" href="http://www.firstobject.com/dn_markFindGetData.htm">FindGetData</A> and <A class="codelink" href="http://www.firstobject.com/dn_markFindSetData.htm">FindSetData</A> now support getting and setting attributes and creating any elements necessary (specified by simple absolute path) to set an attribute</li>
<li><code>GetEncodingCodePage</code>, <code>GetOpenFileSize</code>, and <code>GetOpenFileOffset</code> accessors added</li>
<li>fix: Linux __w64 Markup.cpp line 752 pointer arithmetic warning fixed <i>*thanks Mohan</i></li>
</ul>

<p>See also:</p>

<P><A href="http://www.firstobject.com/cmarkup-11.0-release-notes.htm">CMarkup 11.0 Release Notes</A><BR>
<A href="http://www.firstobject.com/cmarkup-10.1-release-notes.htm">CMarkup 10.1 Release Notes</A><BR>
<A href="http://www.firstobject.com/cmarkup-10.0-release-notes.htm">CMarkup 10.0 Release Notes</A><BR>
<A href="http://www.firstobject.com/dn_markrel.htm">Archived CMarkup Release Notes</A></P>


]]></description>
</item>
<item>
<title>Parse huge XML file in C++</title>
<link>http://www.firstobject.com/parse-huge-xml-file-in-c++.htm</link>
<guid isPermaLink="false">parse-huge-xml-file-in-c++.htm</guid>
<pubDate>Fri, 29 May 2009 04:20:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Dave of <a href="http://neocurve.com/">NeoCurve</a> wrote in and asked to evaluate the developer version of CMarkup, and he quickly got up and running with a 460MB subset of his XML data and then with a 1.8GB XML file. He later said the "warp-speed pull-parser design allowed us to easily manage our huge data files with very little overhead."</p>

<p>In one of its <a href="http://www.firstobject.com/large-xml-file-in-c++.htm">features for a large XML file in C++</a>, CMarkup's <a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">XML pull</a> functionality was designed to provide the low overhead benefit of an XMLReader without the complexity of a <a href="http://www.firstobject.com/xml-reader-sax-vs-xml-pull-parser.htm">SAX event based parser</a>. An XML pull parser lets you pull from a large XML file a little bit at a time forward-only and read-only, processing the data you want, but keeping only a small block of the file in memory at a time. The overhead is very small and the speed is almost as fast as the I/O read operation.</p>

<P><TABLE width=100% cellspacing=0 cellpadding=5><TR><TD valign=top bgcolor=fafae2 width=30>
<P><A href="http://www.firstobject.com/dn_markdev.htm"><IMG border=0 src="http://www.firstobject.com/cmarkupdev.gif" alt="CMarkup Developer License"></A></P></TD><TD bgcolor=fafae2>
<P><a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">XML pull parser</a> functionality (file read mode) is in the <A href="http://www.firstobject.com/dn_markdev.htm">developer version</A> of CMarkup.</P>
</TD></TR></TABLE></P>

<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> pull from a huge XML file a little bit at a time</p></div>
<div class=commentposted><p>Dave Terracino 05-May-2009</p></div>
<div class=commentcontent>
<p>What we have is a rather large XML file [1.82GB] that we need to pull all the data from a little bit at a time. This is a file that was serialized to an XML file from a .NET application and needs to be read back in by our Visual C++ 6.0 application. We need this ASAP, as we've been trying other paths to no avail.</p>
</div></div>

<p>You can quickly write an application to process a file a little bit at a time starting with the <A class="codelink" href="http://www.firstobject.com/dn_markOpen.htm">Open</A> method. The following code loops through all of the object elements to process the properties of each object.</p>

<pre>CMarkup xmlpullparser;
xmlpullparser.Open( "hugexmlfile.xml", CMarkup::MDF_READFILE );
<FONT color=blue>while</FONT> ( xmlpullparser.FindElem("//object") )
{
  <font color=green>// process object properties...</font></pre>

<p class=commentplace>&nbsp;</p>
<div class=commentbox>
<div class=commenttitle><p><img border=0 src="http://www.firstobject.com/letter.gif" alt="comment posted"/> missing tag in huge XML file</p></div>
<div class=commentposted><p>Dave Terracino 08-May-2009</p></div>
<div class=commentcontent>
<p>So here is where we get into trouble... if a property is NULL, the .NET serializer doesn't write out the tag at all. So what happens when we use <code>FindElem("property1")</code> is that it scans to the end, and we can't read any more of the data. Do you have any suggestions? Is there some way to record where we started, and roll the file pointer back to that location so we can avoid the problem of a missing tag in the XML?</p>
</div></div>

<p>If you were loading the entire file into memory, you could use methods like <A class="codelink" href="http://www.firstobject.com/dn_markResetMainPos.htm">ResetMainPos</A>, <A class="codelink" href="http://www.firstobject.com/dn_markRestorePos.htm">RestorePos</A> and <A class="codelink" href="http://www.firstobject.com/dn_markGotoElemIndex.htm">GotoElemIndex</A> to go back and scan for each property from the beginning of the object. But in file read mode you are forward-only. So don't do this:</p>

<pre>xmlpullparser.IntoElem();
xmlpullparser.FindElem( "property1" );
str sProp1 = xmlpullparser.GetData();
xmlpullparser.FindElem( "property2" );
str sProp2 = xmlpullparser.GetData();
xmlpullparser.OutOfElem();</pre>

<p>If any of those properties is not found, the parser will scan to the end of the object element bypassing the remaining properties. Or if any of those properties is not in the expected order, again you could bypass some of the properties. Instead, you must handle each property in the order of occurence like this:</p>

<pre>xmlpullparser.IntoElem();
<FONT color=blue>while</FONT> ( xmlpullparser.FindElem() )
{
  str sPropName = xmlpullparser.GetTagName();
  str sPropValue = xmlpullparser.GetData();
  <FONT color=blue>if</FONT> ( sPropName == "property1" )
    ; <font color=green>// do something with property 1</font>
  <FONT color=blue>else</FONT> <FONT color=blue>if</FONT> ( sPropName == "property2" )
    ; <font color=green>// do something with property 2</font>
  <font color=green>// etc</font>
}
xmlpullparser.OutOfElem();</pre>


]]></description>
</item>
<item>
<title>Large XML file in C++</title>
<link>http://www.firstobject.com/large-xml-file-in-c++.htm</link>
<guid isPermaLink="false">large-xml-file-in-c++.htm</guid>
<pubDate>Tue, 05 May 2009 01:35:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>CMarkup is a simple cross-platform C++ XML API that allows you to parse large XML files, generate large XML files, and even append to large XML files, using a read or write file mode that has very low memory requirements. CMarkup also supports loading and generating large XML documents rapidly in memory with a light footprint. See what you can do with big XML:</p>

<h2>Parse large XML file</h2>

<p>A large XML file doesn't have to be loaded entirely into memory at once to get some or all of the information out of it. Extract information from any file regardless of size (even huge XML files >4GB) in a forward-only read-only way. The memory usage is only a small in-memory buffer of about 16KB. See the <a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">C++ XML reader</a> available in <a href="http://www.firstobject.com/dn_markdev.htm">CMarkup Developer</a>.</p>

<pre>CMarkup xml;
xml.<A class="codelink" href="http://www.firstobject.com/dn_markOpen.htm">Open</A>( "7GB.xml", MDF_READFILE );
<FONT color=blue>while</FONT> ( xml.FindElem( "//record[@type='A']" )
{
  ...</pre>

<h2>Generate large XML file</h2>

<p>Write information to a file of any size (even huge >4GB) in a forward-only write-only way. The memory usage is only a small in-memory buffer of about 16KB. See the <a href="http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm">C++ XML writer</a> available in <a href="http://www.firstobject.com/dn_markdev.htm">CMarkup Developer</a>.</p>

<pre>CMarkup xml;
xml.Open( "huge.xml", MDF_WRITEFILE );
<FONT color=blue>while</FONT> ( recordset.FetchNext() )
{
  xml.AddElem( "record" );
  ...</pre>

<h2>Append to large XML file</h2>

<p>Write information to the end of a file of any size in a forward-only write-only way. The memory usage is only a small in-memory buffer of about 16KB. Append can be achieved by keeping the file open in write mode and calling <A class="codelink" href="http://www.firstobject.com/dn_markFlush.htm">Flush</A> after writing each set of records, or by opening with the <code>MDF_APPENDFILE</code> flag and closing each time.  See the <a href="http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm">C++ XML writer</a>.</p>

<h2>Large XML in-memory</h2>

<p>You might be surprised that you can have even a 100MB+ document all in memory and manipulate it for extracting information, modifying and creating in a few seconds (sub-second speeds for 10MB+). Load, parse XML file, and save to file. The C++ XML API is designed to use the same simple methods whether going to file or keeping in memory. The memory footprint is typically 1.5 to 2.5 times the size of the document with additional temporary needs periodically when the document grows in size significantly.</p>

<pre>CMarkup xml;
xml.<A class="codelink" href="http://www.firstobject.com/dn_markLoad.htm">Load</A>( "100MB.xml" );
<FONT color=blue>while</FONT> ( xml.FindElem( "//record[@type='A']" )
{
  ...</pre>

<h2>Edit a large XML file</h2>

<p>Edit 100MB+ large XML files in the free firstobject XML editor for Windows (<a href="http://www.firstobject.com/dn_editor.htm">foxe</a>). A slow machine will load and display a 3MB UTF-8 file in a tenth of a second, along with the instantaneous tree view. The file size is limited by available system memory to hold the document in memory and to convert it if it is not UTF-8. Editing a file size over 500MB is possible on a machine with 4GB of RAM. You can also script CMarkup huge XML file access as shown above with <a href="http://www.firstobject.com/dn_foal.htm">FOAL</a>.</p>

<p>The <a href="http://www.firstobject.com/dn_markadvanced.htm">Advanced CMarkup Developer License</a> comes with the Visual C++ source code of the firstobject XML Editor, which means you get the <A class="codelink" href="http://www.firstobject.com/dn_dataedit.htm">CDataEdit</A> class to edit very large text documents, the instantaneous <code>CMarkupTreeCtrl</code> class, and the <code>CFoalProgram</code> compiler/virtual machine/debugger class.</p>

<p>See also:</p>

<p>
<a href="http://www.firstobject.com/cmarkup-11.0-release-notes.htm">File mode in CMarkup 11.0</a><br>
<a href="http://www.firstobject.com/parse-huge-xml-file-in-c++.htm">Parse huge XML file in C++</a><br>
<a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">C++ XML reader parses a very large XML file</a><br>
<a href="http://www.firstobject.com/xml-reader-sax-vs-xml-pull-parser.htm">XML reader models: SAX versus XML pull parser</a><br>
<a href="http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm">C++ XML writer creates a very large XML file</a><br>
<a href="http://www.firstobject.com/dn_markOpen.htm">CMarkup Open Method</a><br>
<a href="http://www.firstobject.com/dn_markClose.htm">CMarkup Close Method</a><br>
</p>


]]></description>
</item>
<item>
<title>C++ XML reader parses a very large XML file</title>
<link>http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm</link>
<guid isPermaLink="false">c++-xml-reader-parses-large-xml-file.htm</guid>
<pubDate>Tue, 24 Mar 2009 07:00:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Got a very large XML file to parse in C++? Need to process a huge XML file to calculate statistics, import into a database, or generate a report? Just want to look at the top of your big XML file to determine if you want to continue, never reading the whole document into memory? The CMarkup <b>file read mode</b> provides a simple high performance C++ XML reader to do these kinds of tasks.</p>

<P><TABLE width=100% cellspacing=0 cellpadding=5><TR><TD valign=top bgcolor=fafae2 width=30>
<P><A href="http://www.firstobject.com/dn_markdev.htm"><IMG border=0 src="http://www.firstobject.com/cmarkupdev.gif" alt="CMarkup Developer License"></A></P></TD><TD bgcolor=fafae2>
<P>File read and write modes (see <a href="http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm">C++ XML writer</a> too) are in the developer version of CMarkup.</P>
</TD></TR></TABLE></P>

<p>Even though CMarkup is surprisingly light and quick to load and modify multi-megabyte files in memory, this XML reader mode is for cases where that is not good enough. Read mode provides read-only forward-only pull parser access, even in huge multi-gigabyte XML files.</p>

<h2>How to use CMarkup file read mode</h2>

<p>Here's some code that scans the elements in a large XML file fast, without trying to load the entire file at once. The main difference from regular CMarkup usage is that rather than calling <A class="codelink" href="http://www.firstobject.com/dn_markLoad.htm">Load</A>, you <A class="codelink" href="http://www.firstobject.com/dn_markOpen.htm">Open</A> the file for read. Notice the <code>Open</code> and <A class="codelink" href="http://www.firstobject.com/dn_markClose.htm">Close</A> calls:</p>

<pre>CMarkup xmlreader;
<b>xmlreader.Open( "largeXMLfile.xml", MDF_READFILE );</b>
xmlreader.FindElem(); <font color=green>// root</font>
MCD_STR sType = xmlreader.getAttrib( "infotype" );
xmlreader.IntoElem();
<FONT color=blue>while</FONT> ( xmlreader.FindElem() )
{
  MCD_STR sID = xmlreader.GetAttrib( "id" );
  xmlreader.IntoElem();
  MCD_STR sName = xmlreader.FindGetData( "name" );
  MCD_STR sRef = xmlreader.FindGetData( "ref" );
  xmlreader.OutOfElem();
}
<b>xmlreader.Close();</b></pre>

<h2>Pull parser design</h2>

<p>The CMarkup C++ XML reader requires no callbacks, no events, and no setup. Just open the file and pull what you want from it using the same CMarkup methods you would use if you were navigating it in memory. Rather than developing an entirely new interface for the XML reader (and <a href="http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm">XML writer</a>) functionailty of CMarkup, much remains the same except that you open the file instead of accessing the document in memory. See <a href="http://www.firstobject.com/xml-reader-sax-vs-xml-pull-parser.htm">XML reader models: SAX versus XML pull parser</a> for a discussion of the major XML reader design options.</p>

<p>Here is an example of a query lookup based on an id. Open the file, query the information, and close it. This example will do a sequential read through the file until it finds the matching information, or until it reaches the end of the file.</p>

<pre>CMarkup xmlreader;
xmlreader.Open( "largeXMLfile.xml", MDF_READFILE );
<FONT color=blue>if</FONT> ( xmlreader.FindElem("//data[@id='5632av']") )
{
  xmlreader.IntoElem();
  MCD_STR sName = xmlreader.FindGetData( "name" );
}
xmlreader.Close();</pre>

<h2>C++ XML reader methods</h2>

<p>CMarkup's file read mode limits the methods you can use and the ways you can use those methods. The key thing to remember is that it is forward-only pull parsing from file so you can only navigate forward in the document you are reading once-through. And since you can only read in a single position, you cannot use child element methods.</p>

<p>Here are the CMarkup methods that can be used, and a brief explanation of how they work in file read mode:</p>

<TABLE class=methodstable width=100%>
<TR bgcolor=fafae2><TD width=150><A href="http://www.firstobject.com/dn_markOpen.htm">Open</A></TD><TD>
With flag <code>MDF_READFILE</code>, opens file for read</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markClose.htm">Close</A></TD><TD>
Closes file and ends file mode. Automatically invoked by destructor</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markFindElem.htm">FindElem</A></TD><TD>
In file read mode, locates <I>next</I> sibling element, optionally matching tag name or path; however, unlike regular mode, if an element is not found then the current position will be at the end tag of the parent element or at the end of the document if it was not within a parent element</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markGetData.htm">GetData</A></TD><TD>
In file read mode, returns the string value of the current element or node</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markFindGetData.htm">FindGetData</A></TD><TD>
In file read mode, locates the <I>next</I> element matching the specified path and returns the string value; however, unlike regular mode, if an element is not found then the current position will be at the end tag of the parent element or at the end of the document if it was not within a parent element</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markGetAttrib.htm">GetAttrib</A></TD><TD>
In file read mode, returns the string value of the specified attribute of the current element (or processing instruction)</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markHasAttrib.htm">HasAttrib</A></TD><TD>
In file read mode, returns true if the specified attribute of the current element (or processing instruction) exists</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markGetAttribName.htm">GetAttribName</A></TD><TD>
In file read mode, returns the name of attribute specified by number for the current element</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markGetNodeType.htm">GetNodeType</A></TD><TD>
In file read mode, returns the node type of the current node</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markGetTagName.htm">GetTagName</A></TD><TD>
In file read mode, returns the tag name of the current element (or processing instruction)</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markFindNode.htm">FindNode</A></TD><TD>
In file read mode, locates <I>next</I> sibling node, optionally matching node type(s); however, unlike regular mode, if a node is not found then the current position will be at the end tag of the parent element or at the end of the document if it was not within a parent element.</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markIntoElem.htm">IntoElem</A></TD><TD>
In file read mode, goes "into" current element to find elements and nodes between its start and end tags</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markOutOfElem.htm">OutOfElem</A></TD><TD>
In file read mode, goes "out of" current element to find elements and nodes after its end tag</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markGetElemPath.htm">GetElemPath</A></TD><TD>
In file read mode, returns a string representing the absolute path of the main position element, allowing for a maximum of 255 uniquely named sibling elements</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markGetDoc.htm">GetDoc</A></TD><TD>
In file read mode, returns the partial document markup string which is the most recently retrieved from the file</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markGetSubDoc.htm">GetSubDoc</A></TD><TD>
<b>Update June 7, 2009:</b> In <A href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">Release 11.1</A> file read mode, returns the markup string of the subdocument rooted in the current position element. If the element has no child elements, the element remains the current position, otherwise the current position is after the end of the subdocument.</TD></TR>
</TABLE>

<p>You can also use any CMarkup static utility function because these do not involve the CMarkup object state or data members.</p>

<h4>A window into the document</h4>

<p>File read mode provides on-the-fly charset conversion to the in-memory charset of a <code>MARKUP_FILEBLOCKSIZE</code>-based size "read block" at a time. The <code>m_strDoc</code> document string member is used as a partial document buffer letting you view the current block of the document in the debugger variables the same way you do when not in file mode, giving great visibility into the document and the behind the scenes positioning in the actual document text.</p>

<h4>Copying a CMarkup object in file mode</h4>

<p>The copy constructor and assignment operator <code>=</code> do not work when copying a CMarkup object in either read or write file mode. This is because the CMarkup object encapsulates an open file pointer, a system handle which can only be managed by one CMarkup object at a time.</p>

<h4>Poorly formed markup containment hierarchy</h4>

<p>This note is only for developers dealing with HTML or loosely formed markup. CMarkup is designed to deal with ill-formed XML and things such as non-ended <code lang=xml><FONT color=#0000ff>&lt;br&gt;</FONT></code> line break tags in HTML. See <a href="http://www.firstobject.com/dn_markgeneric.htm">Generic Markup In CMarkup</a>.</p>

<p>So in that same spirit, file read mode is able to keep going despite non-hierarchically formed markup. However, the recovery algorithm works differently in read mode because CMarkup has not parsed the whole file and does not know what is to come in the rest of the file.</p>

<p>In the case of a non-ended element tag, file read mode has to assume the end tag will be found later in the file until it encounters the end tag of the enclosing element. This is different than the in-memory policy for dealing with non-hierarchical markup described in the CMarkup <a href="http://www.firstobject.com/dn_markhierarchy.htm">Containment Hierarchy</a>.</p>


]]></description>
</item>
<item>
<title>C++ XML writer creates a very large XML file</title>
<link>http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm</link>
<guid isPermaLink="false">c++-xml-writer-creates-large-xml-file.htm</guid>
<pubDate>Tue, 24 Mar 2009 07:00:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Need to create a very large XML file from C++ without holding your entire XML document in memory? Exporting gigabytes of records from a database? Logging or archiving huge amounts of XML data to file? CMarkup <b>file write mode</b> provides a simple high performance C++ XML writer. You open the file and then just add data with the same CMarkup methods you would use to create the document in memory.</p>

<P><TABLE width=100% cellspacing=0 cellpadding=5><TR><TD valign=top bgcolor=fafae2 width=30>
<P><A href="http://www.firstobject.com/dn_markdev.htm"><IMG border=0 src="http://www.firstobject.com/cmarkupdev.gif" alt="CMarkup Developer License"></A></P></TD><TD bgcolor=fafae2>
<P>File read and write modes (see <a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">C++ XML reader</a> too) are in the developer version of CMarkup.</P>
</TD></TR></TABLE></P>

<p>Even though CMarkup can lightly create multi-megabyte documents in memory and then write them to disk, file write mode lets you write to disk as you create the document to conserve memory. File write mode provides low footprint write-only forward-only push to file to create huge XML files.</p>

<h2>How to use CMarkup file write mode</h2>

<p>Instead of immediately adding elements to the newly instantiated CMarkup object, just use the <A class="codelink" href="http://www.firstobject.com/dn_markOpen.htm">Open</A> method to open a file in write mode before adding elements. Here's some C++ source code to show you how it works; the only new methods you need are <code>Open</code> and <A class="codelink" href="http://www.firstobject.com/dn_markClose.htm">Close</A>:</p>

<pre>CMarkup xmlwriter;
<b>xmlwriter.Open( "inventorydata.xml", MDF_WRITEFILE );</b>
xmlwriter.AddElem( "root" );
xmlwriter.SetAttrib( "infotype", "inventorydata" );
xmlwriter.IntoElem();
std::string strID, strName, strRef;
<FONT color=blue>while</FONT> ( GetInventoryData(strID,strName,strRef) )
{
  xmlwriter.AddElem( "data" );
  xmlwriter.SetAttrib( "id", strID );
  xmlwriter.IntoElem();
  xmlwriter.AddElem( "name", strName );
  xmlwriter.AddElem( "ref", strRef );
  xmlwriter.OutOfElem();
}
<b>xmlwriter.Close();</b></pre>

<h2>C++ XML writer methods</h2>

<p>CMarkup's file write mode limits the methods you can use and the ways you can use those methods. The key thing to remember is that it is forward-only push to file so you cannot navigate in the document you are creating; you can only add elements and nodes, and set attributes. And since you can only write in a single position, you cannot use child element methods.</p>

<p>Here are the CMarkup methods that can be used, and a brief explanation of how they work in file write mode:</p>

<TABLE class=methodstable width=100%>
<TR bgcolor=fafae2><TD width=150><A href="http://www.firstobject.com/dn_markOpen.htm">Open</A></TD><TD>
With flag <code>MDF_WRITEFILE</code>, exclusively opens file for write (erasing contents if file exists)</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markClose.htm">Close</A></TD><TD>
Closes file and ends file mode. Automatically invoked by destructor</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markFlush.htm">Flush</A></TD><TD>
Only to be used in file write mode, this flushes any partial document in memory (up to the closing tags) and the file stream itself</TD></TR>
<TR bgcolor=fafae2><TD width=150><A href="http://www.firstobject.com/dn_markAddElem.htm">AddElem</A></TD><TD>
In file write mode, adds an element after the current node or element. The added element becomes the current element. You cannot specify a data value if you want to call <code>IntoElem</code> and add elements and nodes inside it</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markSetAttrib.htm">SetAttrib</A></TD><TD>
In file write mode, sets the value of the specified attribute of the current element (or processing instruction)</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markSetData.htm">SetData</A></TD><TD>
In file write mode, sets the value of the element or node, only works for an element if the element was added without a value</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markIntoElem.htm">IntoElem</A></TD><TD>
In file write mode, goes "into" current element (which must be empty, see <code>AddElem</code> explanation above) to add elements and nodes between its start and end tags</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markOutOfElem.htm">OutOfElem</A></TD><TD>
In file write mode, goes "out of" element to add elements and nodes after its end tag, and unlike regular CMarkup usage it sets current position after end tag</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markAddNode.htm">AddNode</A></TD><TD>
In file write mode, adds a node after the current node or element, the added node becomes the current node</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markGetElemPath.htm">GetElemPath</A></TD><TD>
In file write mode, returns a string representing the absolute path of the main position element, allowing for a maximum of 255 uniquely named sibling elements</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markGetDoc.htm">GetDoc</A></TD><TD>
In file write mode, returns the partial document markup string which has not yet been written to file</TD></TR>
<TR bgcolor=fafae2><TD><A href="http://www.firstobject.com/dn_markAddSubDoc.htm">AddSubDoc</A></TD><TD>
<b>Update June 7, 2009:</b> In <A href="http://www.firstobject.com/cmarkup-11.1-release-notes.htm">Release 11.1</A> file write mode, adds the specified markup string after the current position. If the added subdocument is an element with no child elements, the added element becomes the current position (and you can set attributes), otherwise the current position is after the end of the added subdocument.</TD></TR>
</TABLE>

<p>You can also use any CMarkup static utility function because these do not involve the CMarkup object state or data members.</p>

<h4>Appending</h4>

<p>If you are generating a log file without a root element, you can open the file with the <code>MDF_APPENDFILE</code> flag to add data starting at the end of the file if it exists. The <A class="codelink" href="http://www.firstobject.com/dn_markFlush.htm">Flush</A> method can be used to ensure data has been written to disk when you want to reduce chances of failure in the middle of a logical section of the data; however it will reduce performance if used too often.</p>

<h4>A window into the document</h4>

<p>In file write mode, the <code>m_strDoc</code> document string member is used as a partial document buffer letting you view the current segment or "write block" of the document in the debugger variables the same way you do when not in file mode, giving great visibility into the document and the behind the scenes positioning in the actual document text. The partial document text holds the markup you create until a <code>MARKUP_FILEBLOCKSIZE</code>-based size is reached and it is converted to the file charset as it is written to file.</p>

<h4>Copying a CMarkup object in file mode</h4>

<p>The copy constructor and assignment operator <code>=</code> do not work when copying a CMarkup object in either read or write file mode. This is because the CMarkup object encapsulates an open file pointer, a system handle which can only be managed by one CMarkup object at a time.</p>


]]></description>
</item>
<item>
<title>Archived CMarkup 11.0 Release Notes</title>
<link>http://www.firstobject.com/cmarkup-11.0-release-notes.htm</link>
<guid isPermaLink="false">cmarkup-11.0-release-notes.htm</guid>
<pubDate>Tue, 24 Mar 2009 07:00:00 GMT</pubDate>
<category>CMarkup Articles</category>
<description><![CDATA[

<p>Release 11.0 Date: March 24, 2009, <a href="http://www.firstobject.com/dn_markup.htm">download</a></p>

<img class=insetimage align=right src="http://www.firstobject.com/galaxy.gif" alt="courtesy NASA/JPL-Caltech"/>

<p>A developer at NASA Jet Propulsion Laboratory contacted me years ago evaluating the use of XML to store metadata for 6 to 12 GB of raw telemetry data collected each observation night. The metadata file would not be as large as the raw data file, but it needed to be one file that would grow through the night. He did't want to re-save the whole file with each update, he needed to append (to write and flush) only the additional records to get them to disk as quickly as possible to minimize data loss in the case of a computer crash.</p>

<p>The new "file mode" in CMarkup essentially lets you use your familiar CMarkup methods directly on files, rather than having the entire document in memory. This opens the way for more efficient large XML file-based solutions, and even really huge XML files. To learn more about file mode in CMarkup see the <a href="http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm">C++ XML writer</a> and <a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">C++ XML reader</a>.</p>

<p>Here's the list of 11.0 enhancements:</p>

<ul>
<li>New <A class="codelink" href="http://www.firstobject.com/dn_markGetResult.htm">GetResult</A> method returns detailed result markup which should be used instead of <A class="codelink" href="http://www.firstobject.com/dn_markGetError.htm">GetError</A> for structured result information. All result and error reporting has been overhauled in this release, and the "empty document" message has been removed</li>
<li>Debugging information improved to show current node and current parent in a single <code>m_pCurrent</code> pointer. See <a href="http://www.firstobject.com/dn_markdebug.htm">Debugging with CMarkup</a></li>
<li>Updated <a href="http://www.firstobject.com/character-set-name-alias-code-page.htm">charset names, aliases and code pages</a> in CMarkup and added support for GB18030 code page 54936</li>
<li>Refactored CMarkup to remove all nested structs and greatly simplify the header file while better encapsulating struct functionality in the cpp file, making the code cleaner, more readable, easier to export from a library, and compatible with more compilers</li>
</ul>

<TABLE width=100% cellspacing=0 cellpadding=5><TR><TD valign=top bgcolor=fafae2 width=30>
<P><A href="http://www.firstobject.com/dn_markdev.htm"><IMG border=0 src="http://www.firstobject.com/cmarkupdev.gif" alt="CMarkup Developer License"></A></P></TD><TD bgcolor=fafae2>
<P>The following are only in <a href="http://www.firstobject.com/dn_markdev.htm">CMarkup Developer</a> and the <a href="http://www.firstobject.com/dn_editor.htm">free XML editor </a>&nbsp;<a href="http://www.firstobject.com/dn_foal.htm">FOAL C++ scripting</a></P>
</TD></TR></TABLE>

<ul>
<li>New <a href="http://www.firstobject.com/c++-xml-writer-creates-large-xml-file.htm">C++ XML writer</a> and <a href="http://www.firstobject.com/c++-xml-reader-parses-large-xml-file.htm">C++ XML reader</a> handle huge/large XML file-based requirements. The new methods are <A class="codelink" href="http://www.firstobject.com/dn_markOpen.htm">Open</A>, <A class="codelink" href="http://www.firstobject.com/dn_markClose.htm">Close</A>, and <A class="codelink" href="http://www.firstobject.com/dn_markFlush.htm">Flush</A></li>
<li>New <A class="codelink" href="http://www.firstobject.com/dn_markHasAttrib.htm">HasAttrib</A> method to compliment <A class="codelink" href="http://www.firstobject.com/dn_markGetAttrib.htm">GetAttrib</A> in distinuishing between an empty attribute value and not having the attribute at all. Previously, you needed to iterate with <A class="codelink" href="http://www.firstobject.com/dn_markGetAttribName.htm">GetAttribName</A> to determine this. New also is the corresponding <A class="codelink" href="http://www.firstobject.com/dn_markHasChildAttrib.htm">HasChildAttrib</A> method</li>
</ul>

<p>See also:</p>

<P><A href="http://www.firstobject.com/cmarkup-10.1-release-notes.htm">CMarkup 10.1 Release Notes</A><BR>
<A href="http://www.firstobject.com/cmarkup-10.0-release-notes.htm">CMarkup 10.0 Release Notes</A><BR>
<A href="http://www.firstobject.com/dn_markrel.htm">Archived CMarkup Release Notes</A></P>


]]></description>
</item>
</channel>
</rss>
