XML Serialization in C++ with CMarkup

You can XML serialize C++ classes using CMarkup. XML is an excellent format for C++ class serialization because the XML file will be compatible with different builds of your program (whether MBCS, UNICODE, 32-bit or 64-bit), even different platforms.

Example of XML serialization in C++

Here is an example C++ class for this tutorial.

class CXyz
{
public:
  string Serialize();
  void Deserialize( string strSubDoc );
  int m_nNotificationCode;
  string m_strLocationIdentifier;
};

Here is the XML format to contain the state of the CXyz class (i.e. "persist" it) when we perform C++ serialization.

<Object classtype="CXyz">
  <NotificationCode>2<NotificationCode>
  <LocationIdentifier>ATL<LocationIdentifier>
</Object>

We could have used the tag name Xyz for the container element and no classtype attribute, but sometimes it is nice to be able to quickly spot the class elements that will be handled as subdocuments (see below) by naming them "Object".

XML Serialize method

The XML serializer generates a document containing the values of the object's members in the XML format shown above. Here are the steps to creating your Serialize method:

  • instantiate an XML document
  • add the root container element
  • set the classtype attribute
  • go into the container element
  • for each member, add element with tag name derived from member name
  • return the document
  • string CXyz::Serialize()
    {
      CMarkup xml;
      xml.AddElem( "Object" );
      xml.SetAttrib( "classtype", "CXyz" );
      xml.IntoElem();
      xml.AddElem( "NotificationCode", m_nNotificationCode );
      xml.AddElem( "LocationIdentifier", m_strLocationIdentifier );
      return xml.GetDoc();
    }

    The Serialize method returns an XML document string that represents the current state of the object and can be added as a subdocument into another master CMarkup document (containing other objects, see below) if needed and/or written to file.

    XML Deserialize method

    The deserializer receives a document in the same format produced by the serializer method and loads the values into the object's member variables. Here are the steps to creating your Deserialize method:

  • begin with the XML document passed to the Deserialize method
  • find the container element
  • go into the container element
  • for each member, set value if corresponding element found
  • void CXyz::Deserialize( string strSubDoc )
    {
      CMarkup xml( strSubDoc );
      xml.FindElem(); // Object
      xml.IntoElem();
      if ( xml.FindElem("NotificationCode" )
        m_nNotificationCode = MCD_STRTOINT(xml.GetData());
      else
        m_nNotificationCode = 0;
      if ( xml.FindElem("LocationIdentifier") )
        m_strLocationIdentifier = xml.GetData();
      else
        m_strLocationIdentifier = "";
    }

    To make a Deserialize method with version flexibility in mind, use the FindElem method as shown here rather than assuming the first element is NotificationCode and the second one is LocationIdentifier. In this example, if an old version of the class did not have a notification code, this function would still work and just set a 0 value for the notification code. The program could be written to understand this value to indicate either a default notification code or an unavailable notification code. See XML Versioning.

    Combined XML serialization document

    For class serialization and C++ persistence you need to be able to serialize objects that are members of other objects. To combine the serialized C++ objects you treat them as subdocuments. Say you have two CXyz members in a container class called CState:

    class CState
    {
    public:
      string Serialize();
      void Deserialize( string strSubDoc );
    private:
      CXyz m_xyz1, m_xyz2;
      string m_strMode;
    };

    Here is the XML serialization format which has a root element called State containing two Object subdocuments and a Mode element.

    <State>
      <Object classtype="CXyz" id="xyz1">
        <NotificationCode>2<NotificationCode>
        <LocationIdentifier>ATL<LocationIdentifier>
      </Object>
      <Object classtype="CXyz" id="xyz2">
        <NotificationCode>43<NotificationCode>
        <LocationIdentifier>NYC<LocationIdentifier>
      </Object>
      <Mode>A<Mode>
    </State>
    

    Instead of the State tag name we could have used the Object tag name for the container element here like we did with the subobjects, but since it is used as the root element of the master document it is nice to be more descriptive.

    XML serialize multiple objects

    Here is some code that XML serializes both member objects into one master XML serialization document. Use the same steps as for the Serialize method above, except for each object member:

  • add the subobject's Serialize XML document as a subdocument
  • set the id attribute
  • string CState::Serialize()
    {
      CMarkup xml;
      xml.AddElem( "State" );
      xml.IntoElem();
      xml.AddSubDoc( m_xyz1.Serialize() );
      xml.SetAttrib( "id", "xyz1" );
      xml.AddSubDoc( m_xyz2.Serialize() );
      xml.SetAttrib( "id", "xyz2" );
      xml.AddElem( "Mode", m_strMode );
    }

    XML deserialize multiple objects

    In deserialization, use the id attribute of the objects to ensure you match the correct data with the correct member. This Deserialize function demonstrates a different algorithm than above. It loops through all of the elements checking tag names. Use the same steps as in the Deserialize method above, but instead of extracting each member, do the following for each element:

  • if the id attribute matches an object member deserialize it using the subdocument
  • if the tag name matches a data member, set the value from it
  • void CState::Deserialize()
    {
      CMarkup xml;
      xml.Load( "state.xml" );
      xml.FindElem(); // State
      xml.IntoElem();
      while ( xml.FindElem() )
      {
        if ( xml.GetAttrib( "id") == "xyz1" )
          m_xyz1.Deserialize( xml.GetSubDoc() );
        else if ( xml.GetAttrib("id") == "xyz2" )
          m_xyz2.Deserialize( xml.GetSubDoc() );
        else if ( xml.GetTagName() == "Mode" )
          m_strMode = xml.GetData();
      }
    }

    This loop style deserialization also has a lot of built in flexibility for evolving changes in the CState class and the corresponding serialization format.

    Storing state in a file

    Say you have a CState object named m_state. To store the state in a file, you would get the document and write it to file:

    CMarkup xmlState;
    xmlState.SetDoc( m_state.Serialize() );
    xmlState.Save( "state.xml" );

    To restore the state from a file, you would load the document and deserialize:

    CMarkup xmlState;
    xmlState.Load( "state.xml" );
    m_state.Deserialize( xmlState.GetDoc() );
    

    Other types, dates and decimal points

    CMarkup overloads some methods to accept integer data values, and has the MCD_STRTOINT macro for converting a returned string to an integer. But for other types such as floating point numbers, times and dates, you need to write code to convert the various data types to and from strings.

    Avoid any conversions that are affected by system locale.

    If you convert a real number to a string with sprintf, it can use a comma for the decimal point in one OS configuration, and a period in another. You should convert it so that it is always one or the other in the string, regardless of locale.

    For dates, use a standard such as ISO 8601 e.g. "2005-08-15T15:52:01+0000", and avoid locale dependent formats.

    How to XML serialize with less code

    CMarkup Developer License

    The FindSetData and FindGetData methods which provide 1-stop set/get methods for the dynamic structure functionality of CMarkup are only in CMarkup Developer and the free XML editor  FOAL C++ scripting.

    The FindSetData and FindGetData methods, together with absolute paths, provide quick get/set functionality in a CMarkup object. See Dynamic Structure Documents. One step to set the Mode value in the XML document (and create the Mode and State elements if they don't exist).

    xml.FindSetData( "/State/Mode", m_strMode );

    And one step to get the Mode data value.

    m_strMode = xml.FindGetData( "/State/Mode" );

    Having gone through all the code to get and set C++ class data members, sometimes it is advantageous to avoid this altogether. You can use CMarkup objects to carry all the serializable data of a class or structure rather than using individual data members at all. This eliminates the translation between data members and XML, but still requires you to extract and set the values in the CMarkup object in your program. Serialization is then just a matter of calling the Save or GetDoc method.

    I often find it better dealing with a CMarkup object than a struct or class data members, especially when the data set involves arrays and lists, or portions of data that are only occassionally used. XML is very adaptable to hanging additonal pieces of information anywhere in the structure where it makes sense like on one element in an array.