Archived CMarkup Known Issues

Bugs found in previous releases of CMarkup are already fixed but the notices and fixes remain here for users who are still working with old releases. The great thing about a source code product is that you can fix the bug right in your program and be done with it, no temporary work-arounds, no waiting for the next release.

 

11.3 Bug: trim whitespace removes escaped value

March 12, 2011

(fixed in 11.5)

Another bug in MDF_TRIMWHITESPACE and MDF_COLLAPSEWHITESPACE. With one of these flags turned on in SetDocFlags, GetData will return "a" instead of "a <" (losing the encoded less than sign) in the following XML:

<t> a &lt; </t>

Fix for 11.4: in Markup.cpp:3076 insert code to set nCharWhitespace = 0 as follows:

  if ( pSource[nChar] == '&' )
  {
    if ( bAlterWhitespace )
      nCharWhitespace = 0;

    // Get corresponding unicode code point
    int nUnicode = 0;

 

comment posted 11.3 Bug: crash reading value

Joon-Hong Jo 16-Dec-2010

CMarkup xml;
xml.SetDocFlags(CMarkup::MDF_COLLAPSEWHITESPACE);
... if (xml.GetChildData().length() <= 0)
<?xml version="1.0" encoding="utf-8"?>
<Properties>
  <MdlData>
    <Path>\DIR\PATH\</Path>
    <Name>0123_4567</Name>
    <Type>ABC</Type>
  </MdlData>
  <Parameters>
    <Parameter>
      <Name>XYZ_PART_NO</Name>
      <Value>
      </Value>
    </Parameter>
    <Parameter>

When reading [the Value element data], my code crashes.

(fixed in 11.4)

The MDF_TRIMWHITESPACE and MDF_COLLAPSEWHITESPACE modes introduced in release 11.3 crash on values that are only whitespace.

Fix for 11.3: in Markup.cpp:3191 add > 0 as follows:

if ( bAlterWhitespace && nCharWhitespace > 0 )

 

comment posted 11.0 Bug: file read and write modes

07-May-2009

(fixed in 11.1)

If you are using either read or write file mode (the Open method) this is an important fix. Please replace the ElemStack Unslot method in Markup.cpp line 1274 as follows:

void Unslot( TagPos& lp ) { int n=lp.iSlotNext,p=lp.iSlotPrev; if (n) pL[n].iSlotPrev=p; if (p) pL[p].iSlotNext=n; else anTable[lp.nSlot]=n; };

*Thanks Dave Terracino

 

comment posted 10.0 Issue: performance comparison

Michael 10-Oct-2008

(fixed in 10.1)

When I upgraded to CMarkup 10, I ran a small benchmark program that does basic XML writing and reading. Unfortunately CMarkup 10 is slower than revision 9 when writing XML files. There also seems to be a scaling problem:

CMarkup 9  10000 Runs 381ms,       100000 Runs 2814ms
CMarkup 10 10000 Runs 711ms (!),   100000 Runs 38635ms (> 10 times slower!)

[sorry, this is a bug only for generating documents with MFC CString (not STL string), noticeable when generating documents over 100k, which you can fix by removing +n/100 in 2 places in Markup.h, (fixed in 10.1) read more...]

This is a performance bug only in the creation of an XML document such as with the AddElem method. Thank you for discovering this, it is due to a mixup during testing just before release (and the bug is not in foxe 2.3). As you build a bigger document it becomes noticeable over 100k and gets exponentially slower due to reallocations (the same issue described in Speed of CMarkup). Timing the following code for nEntries like 10000 and 100000 illustrates the problem.

CMarkup xml;
for ( int iElem = 0; iElem < nEntries; ++iElem )
  xml.AddElem( _T("elem"), _T("data") );

Fix for 10.0: The changes are only in Markup.h on the two lines with MCD_GETBUFFER defines, remove the +n/100 so they appear as follows on line 127 and 145:

#define MCD_GETBUFFER(s,n) new MCD_CHAR[n+1]; s.reserve(n)
#define MCD_GETBUFFER(s,n) s.GetBuffer(n)

 

7.0-9.0 Issue: Memory checkers report uninitialized variable

April 22, 2008

(fixed in 10.0)

IBM Rational Purify and Boundschecker may complain that nTagLengths is not initialized before it is used in void SetStartTagLen( int n ) { nTagLengths = (nTagLengths & ~EP_STMASK) + n; };. This is caused by the way tag lengths were implemented in the struct ElemPos nTagLengths member since Release 7.0. It is not actually utilizing uninitialized bits, but the memory checkers might not know that. Bit fields alleviate this confusion and they will be used in the next release, but in the meantime you can fix it yourself as shown below. In addition, EP_STMASK is 0x2fffff instead of 0x3fffff causing a problem with start tag lengths over a megabyte (should support 4MB start tags meaning unusually large attributes). The bit fields implementation is more self-evident and less prone to bugs.

Fix for 7.0 to 9.0: The changes are only in Markup.h in struct ElemPos. First replace the following methods:

int StartTagLen() const { return nStartTagLen; };
void SetStartTagLen( int n ) { nStartTagLen = n; };
void AdjustStartTagLen( int n ) { nStartTagLen += n; };
int EndTagLen() const { return nEndTagLen; };
void SetEndTagLen( int n ) { nEndTagLen = n; };

Then a few lines below, replace the int nTagLengths declaration with these two:

unsigned int nStartTagLen : 22; // 4MB limit for start tag
unsigned int nEndTagLen : 10; // 1K limit for end tag

 

8.2 Bug: SavePos/RestorePos with non-ASCII names

September 5, 2006

(fixed in 8.3)

A report of a crash caused by using the firstobject XML editor tree customization feature with Unicode text turned up an underlying bug in releases 7.0 to 8.2 of CMarkup (MFC) and CMarkupSTL (STL). A non-ASCII character name in SavePos and RestorePos will cause a GPF.

Fix: In struct SavedPosMap in Markup.h or MarkupSTL.h, change the Hash function to read as follows (changes shown in bold):

int Hash( LPCTSTR szName ) { unsigned int n=0;
  while (*szName) n += (unsigned int)(*szName++); return n % SPM_SIZE; };

 

comment posted apostrophe problem

Bill Brannan 9-Oct-2004

My XML document is suddenly not being accepted in release 7.2 due to an attribute containing an apostrophe.

(fixed in 7.3)

In XML you can use a single quote inside of a double quoted attribute value and visa versa. CMarkup encodes any attribute in an attribute value so you will generally not encounter this, but you may come across an existing document with the unencoded quotes in attribute values such as:

<H d="d'd" s='s"s'/>
Release 7.2 introduced a change in the way attribute values are parsed that caused it to reject these.

Fix: An incorrect fix was posted here Oct 6 to Oct 9, please update if you used code posted during those days. In x_ParseNode in Markup.cpp or MarkupSTL.cpp where it says else if ( nNodeType == MNT_ELEMENT ) please change it to read as follows: (changes are shown in bold):

if ( *pDoc == _T('\"') && ! (nParseFlags&PD_INQUOTE_S) )
	nParseFlags ^= PD_INQUOTE_D;
else if ( *pDoc == _T('\'') && ! (nParseFlags&PD_INQUOTE_D) )
	nParseFlags ^= PD_INQUOTE_S;

 

comment posted heap corruption

Soren Madsen 30-Aug-2004

Whenever a CMarkup object is assigned the value of an empty CMarkup document there is heap corruption (which can be detected in MFC with AfxCheckMemory() after calling AddElem). There is no problem if the CMarkup object on the right hand side contains any elements, so this is a somewhat rare situation. The case that was reported was using xml = CMarkup(); to empty out the xml CMarkup object, but xml.SetDoc(NULL) is a more direct way to do that.

(fixed in 7.2)

The following three cases involve the assignment operator and lead to heap corruption (where xmlEmpty is either a newly instantiated CMarkup object or one on which SetDoc(NULL) has been called):

  • xmlTest = xmlEmpty;
  • CMarkup xmlTest( xmlEmpty ); (copy constructor calls assignment operator)
  • xmlTest = CMarkup(); (a temporary empty CMarkup is instantiated and assigned)
  • Fix: In operator= in Markup.cpp or MarkupSTL.cpp where it says "Copy used part of the index array," please add a two line if statement so it reads as follows (lines to be added are shown in bold):

    // Copy used part of the index array
    m_aPos.RemoveAll();
    m_aPos.nSize = m_iPosFree;
    if ( m_aPos.nSize < 8 )
      m_aPos.nSize = 8;
    

     

    comment posted previous link bug

    Bill Brannan 9-Aug-2004

    CMarkup seems to lose track of elements after RemoveElem since upgrade to release 7.0.

    (fixed in 7.1)

    In CMarkup release 7.0, a link to previous element is not set correctly when an element or subdocument is inserted in between sibling elements causing a problem in FindPrevElem and in rare cases after RemoveElem affecting the links used by FindElem. This can introduce a bug in tested code when upgrading from a previous release of CMarkup, so this is an important fix. A bug fix release of CMarkup will be made available soon.

    Fix: In x_LinkElem in Markup.cpp or MarkupSTL.cpp where it says "Link in after iPosBefore," please add an else clause to the if statement so it reads as follows (change shown in bold):

    // Link in after iPosBefore
    pElem->nFlags = 0;
    pElem->iElemNext = m_aPos[iPosBefore].iElemNext;
    if ( ! pElem->iElemNext )
    	m_aPos[m_aPos[iPosParent].iElemChild].iElemPrev = iPos;
    else
    	m_aPos[pElem->iElemNext].iElemPrev = iPos;
    m_aPos[iPosBefore].iElemNext = iPos;
    pElem->iElemPrev = iPosBefore;
    

     

    comment posted empy subdocument bug

    Bill Brannan 4-Aug-2004

    When upgrading to CMarkup 7.0, after AddChildSubDoc with the following element as a subdocument:

    <analysis_name>Analysis</analysis_name>
    Debugging shows that GetChildTagName == "analysis_name" so GetChildData SHOULD == "Analysis" but it is returning "" instead. It thinks it is an empty element due to a bug in AddChildSubDoc.

    (fixed in 7.1)

    Fix: In function x_AddSubDoc in Markup.cpp or MarkupSTL.cpp where it links in parent and siblings, add one line and change one line as follows (changes shown in bold):

    // Link in parent and siblings
    bool bEmpty = m_aPos[iPos].nFlags & MNF_EMPTY;
    x_LinkElem( iPosParent, iPosBefore, iPos );
    m_aPos[iPosTempParent].iElemNext = m_iPosDeleted;
    m_iPosDeleted = iPosTempParent;
    if ( bEmpty )
    	m_aPos[iPos].nFlags |= MNF_EMPTY;

     

    comment posted parser rejects certain tag names

    Stefan Herber 28-Jul-2004

    Tagnames like "_Example" produce an error on parsing xml files.

    (fixed in 7.1)

    Release 7.0 parser in both CMarkup and CMarkupSTL erroneously rejects tag names starting with underscore and colon. Errortext e.g.: "Incorrect tag name character at offset 402."

    Fix: In function x_ParseNode in Markup.cpp or MarkupSTL.cpp, I changed it from:

    if ( *pDoc > 0x60 || ( *pDoc > 0x40 && *pDoc < 0x5b ) )
    to (0x5f is underscore, 0x3a is colon)
    if ( *pDoc > 0x60 || ( *pDoc > 0x40 && *pDoc < 0x5b )
         || *pDoc == 0x5f || *pDoc == 0x3a )

     

    6.6 Bug: Comma in Error String

    May 26, 2004

    (fixed in 7.0)

    Release 6.6 introduced an extra comma in the error string. After calling SetDoc( strXML ) with ill-formed XML, the result of GetError() has a comma at the beginning. This bug only exists in Release 6.6.

    Fix: The change is in the x_ParseDoc() function where there is a 4 line if else clause after the remark that says Combine preserved result with parse error. Add the following if statement and curly brackets around the whole if else clause; in the MFC version Markup.cpp use:

    if ( ! csResult.IsEmpty() )
    {
       if ... else ...
    }

    In the STL version MarkupSTL.cpp use:

    if ( strResult.size() )
    {
       if ... else ...
    }

     

    comment posted CMarkupMSXML SetDoc Wastes Memory

    via Dharmesh Shah 28-Aug-2003

    The leak in SetDoc() and x_AddSubDoc() is still in 6.5. ...in our services applications it ended up loosing a pointer to a copy of a BSTR the size of a whole XML document we were loading from a string.

    (fixed in 6.6)

    Fix: In the SetDoc() function in MarkupMSXML.cpp the following line:

    _bstr_t bstrDoc(A2BSTR(szDoc));

    should read:

    _bstr_t bstrDoc(A2BSTR(szDoc),false);

    It appears again x_AddSubDoc() in MarkupMSXML.cpp the following line:

    _bstr_t bstrSubDoc(A2BSTR(szSubDoc));

    should read:

    _bstr_t bstrSubDoc(A2BSTR(szSubDoc), false);

     

    comment posted MBCS Builds, Double Byte Chars

    knight_zhuge 29-Jan-2003

    The internal x_TextToDoc function fails to support double-byte characters. This failure occurs when MBCS is defined for the build and you add double-byte characters to your document (i.e. it does not occur in regular ASCII or UTF-8). For example, if the paramater szText is 3 GB2312 Chinese characters (hex D6 D0 B9 FA C8 CB) or 6 bytes, after the loop csText is only 3 bytes (hex D6 B9 C8).

    (fixed in 6.5)

    Fix: In the x_TextToDoc method in Markup.cpp, change:

    ++nLen;

    to:

    nLen +=_tclen( pSource );

     

    comment posted Add or Insert SubDoc

    Tony Nancarrow 18-Oct-2002

    I have discovered what appears to be a bug in x_AddSubDoc (called from AddSubDoc, InsertSubDoc, AddChildSubDoc and InsertChildSubDoc). If the document to be added or inserted to the parent document contains a processing instruction such as: <?xml version="1.0"?> then the software gets stuck inside an infinite loop within x_AddSubDoc. The problem code appears to be the loop:

    // Skip version tag or DTD at start of subdocument
    TokenPos token( szSubDoc );
    int nNodeType = x_ParseNode( token );
    while ( nNodeType && nNodeType != MNT_ELEMENT )
    {
    token.szDoc = &szSubDoc[token.nNext];
    token.nNext = 0;
    nNodeType = x_ParseNode( token );
    }

    (fixed in 6.5)

    Fix: In the x_AddSubDoc method in either Markup.cpp or MarkupSTL.cpp, change:

    token.szDoc = &szSubDoc[token.nNext];

    to:

    token.szDoc = &token.szDoc[token.nNext];

     

    comment posted UNICODE DecodeBase64

    Eric Mathieu 24-May-2002

    DecodeBase64 is not working in the Windows CE (UNICODE build) of CMarkup.

    (fixed in 6.4)

    CMarkup Developer License

    (this feature is only in CMarkup Developer and the free XML editor  FOAL C++ scripting)

    Fix: In the Markup.cpp DecodeBase64 function, change:

    const BYTE* pBase64 = (const BYTE*)(LPCTSTR)csBase64;

    to:

    LPCTSTR pBase64 = (LPCTSTR)csBase64;

     

    comment posted FindChildElem and the level tracker

    Jonnie White 14-Mar-2002

    If I look for a child element using FindChildElem("/root/list/thing"), I find the right element. The main position is moved to list, and the child position to thing. The level counter doesn't change though:

    m_Doc.ResetPos();
    m_Doc.FindChildElem("/root/list/thing");
    // main pos is list, child pos is thing, level is 0!
    // if I then try to navigate out
    m_Doc.OutOfElem();
    // I can't, because we are already at level 0

    (fixed in 6.3)

    CMarkup Developer License

    Paths In CMarkup (this feature is only in CMarkup Developer and the free XML editor  FOAL C++ scripting)

    Fix: The level is not tracked in release 6.3. To take care of it in previous releases by hand, the if statement in OutOfElem should be

    if ( m_iPos && m_aPos[m_iPos].iElemParent )

    instead of

    if ( m_iPos && m_nLevel > 0 )

     

    comment posted Load/Save Exception Memory Leak

    Nikolay Sokratov 20-Feb-2002

    Bug: everywhere you use CFileException * don't forget to delete it. That will fix memory leak caused by the exception not being deleted.

    catch (CFileException*e)
    {
      e->Delete();
      return FALSE;
    }
    

    (fixed in 6.3)

    Fix: Exceptions are no longer used for compatibility with Windows CE. The CFile constructor is replaced with the Open method that catches the exception. But if you need to implement the fix by hand in releases before 6.3, do as directed above.

     

    comment posted CMarkup on Windows CE

    Reto Bucher 15-Jan-2002

    First, Exceptions are not supported under WinCE (File Handling) and second, two Macros do not exist under CE (_tclen, _tccpy)

    (fixed in 6.3)

    Fix: If you need to implement the fixes by hand in releases 6.1 and 6.2 do the following:

    1. Add the following two defines near the top of Markup.cpp because the double-byte character function defines seem not to be available:

    #define _tclen(p) 1
    #define _tccpy(p1,p2) *(p1)=*(p2)
    

    2. Remove exception handling from the Load method by using CFile::Open rather than CFile constructor:

    bool CMarkup::Load( LPCTSTR szFileName )
    {
      CString csDoc;
      CFile file;
      if (!file.Open(szFileName, CFile::modeRead))
        return false;
     
      int nLength = file.GetLength();
     
    #if defined(UNICODE)
      // Allocate Buffer for UTF-8 file data
      unsigned char* pBuffer = new unsigned char[nLength + 1];
      nLength = file.Read( pBuffer, nLength );
      pBuffer[nLength] = '\0';
     
      // Convert file from UTF-8 to Windows UNICODE (AKA UCS-2)
      int nWideLength = MultiByteToWideChar(CP_UTF8,0,
        (const char*)pBuffer,nLength,NULL,0);
      nLength = MultiByteToWideChar(CP_UTF8,0,
        (const char*)pBuffer,nLength,
        csDoc.GetBuffer(nWideLength),nWideLength);
      ASSERT( nLength == nWideLength );
      delete [] pBuffer;
    #else
      nLength = file.Read( csDoc.GetBuffer(nLength), nLength );
    #endif
      csDoc.ReleaseBuffer(nLength);
      file.Close();
    
      return SetDoc( csDoc );
    }