XML Namespaces and CMarkup

XML Namespaces are an attempt by the creators of the XML standard to systematize the development of markup vocabularies of element and attribute names (see Namespaces in XML for the motivation behind XML namespaces and all the gory details). The bottom line is that XML Namespaces often become an obstacle to developers needing to deploy applications and parse existing documents, but CMarkup allows you to overcome these obstacles painlessly.

The Quickest XML Namespaces Intro Ever

XML Namespaces by Example, Tim Bray on XML.com, is a great quick start, but I'll make it even quicker and throw in default namespaces to boot. In the root element of the document you will often find attributes that declare namespaces such as:

<h:html xmlns:h="http://www.w3.org/HTML/1998/html4"
        xmlns:xdc="http://www.xml.com/books">

This declares the h and xdc prefixes which will be used in the document. It is important to see the names "http://www.w3.org/HTML/1998/html4" and "http://www.xml.com/books" simply as unique identifiers; no attempt is made to access those URLs during manipulation of the XML document. In this example it allows you to show the title element is from the XDC vocabulary rather than the HTML one.

<xdc:title h:style="font-weight:bold;">

There is also a default namespace (declared with the xmlns attribute alone) which applies to all non-prefixed elements within and including the element where it is specified. It can be overridden by re-specifying the default namespace in a contained element (using an empty string to turn off the default). Note that in this example the second title element is from the "http://www.xml.com/books" namespace, not "http://www.w3.org/HTML/1998/html4".

<html xmlns="http://www.w3.org/HTML/1998/html4">
 <head><title>Book Review</title></head>
 <body>
  <bookreview xmlns="http://www.xml.com/books">
   <title>XML: A Primer</title>

That is basically all there is to it! Well, there are some complications, like default namespaces not applying to attributes, and the issue of an element from one vocabulary having an attribute from another was hinted in the example with the h:style attribute.

Programming XML Namespaces

Well, a soon as you try to hide the complexity of colliding vocabularies from the developer, you are sure to cause twice the grief. Aside from making XML less human readable, the main problem with XML Namespaces really is in trying to gracefully support them in an API.

How do you do it with CMarkup? Don't do anything! Just treat all the tag names and attributes literally and you can access, create and modify any document format at all. CMarkup holds to the tenet that if you cannot greatly simplify it, don't even try.

In MSXML you cannot set the xmlns attribute like you set other attributes, it needs to be set as part of the createNode function.

 

comment posted The xmlns attribute

Steve Reilly 29-Oct-2004

This problem only occurs when you use MarkupMSXML, not standard CMarkup. I'm having a problem - the following code:

xml.SetDoc( NULL );
xml.AddElem( _T("mim_0003"));
xml.AddAttrib( _T("xmlns"),_T("http://www.mimosa.org/TechXMLV3-0"));
xml.IntoElem(); // inside mim_003
xml.AddElem( _T("connect_req"));
xml.IntoElem(); // inside connect_req
xml.AddElem( _T("param"));
xml.AddAttrib( _T("connect_string"),cstrValue);
xml.AddAttrib( _T("language_code"),_T("en-US"));
xml.AddAttrib( _T("include_non_active_rows_def"),_T("0"));
xml.AddAttrib( _T("include_CRIS_ref_data_rows_def"),_T("0"));
xml.AddAttrib( _T("include_row_info_columns_def"),_T("0"));
xml.AddAttrib( _T("include_lc_info_columns_def"),_T("0"));
CString cstrXML = xml.GetDoc();
cstrXML = cstrHeader + cstrXML;
c_WorkDetailEdit.SetWindowText(cstrXML);

should produce the following [formatting added for clarity]

<mim_0003 xmlns="http://www.mimosa.org/TechXMLV3-0">
  <connect_req>
    <param
      connect_string="CMMS Test Connect"
      language_code="en-US"
      include_non_active_rows_def="0"
      include_CRIS_ref_data_rows_def="0"
      include_row_info_columns_def="0"
      include_lc_info_columns_def="0"
      />
  </connect_req>
</mim_0003>

instead, I get

<mim_0003 xmlns="http://www.mimosa.org/TechXMLV3-0">
  <connect_req xmlns="">
    <param
      connect_string="CMMS Test Connect"
      language_code="en-US"
      include_non_active_rows_def="0"
      include_CRIS_ref_data_rows_def="0"
      include_row_info_columns_def="0"
      include_lc_info_columns_def="0"
      />
  </connect_req>
</mim_0003>

The xmlns attribute on element mim_0003 is being duplicated on the connect_req element and there seems to be no way to get rid of it. What am I doing wrong?

I am able to reproduce what you are observing. Any element I add under mim_0003 gets an xmlns="" attribute. This is because MSXML requires the default namespace to be set as part of the createNode function, not with setAttribute. The default namespace needs to be specified to every element. When you retrieve the XML string with GetDoc or Save, MSXML generates xmlns attributes for each element based on comparing its namespace to that of the parent. The solution in CMarkupMSXML is a function added in release 7.3 called SetDefaultNamespace. Here is how you would use it:

CMarkupMSXML xml;
xml.SetDefaultNamespace( _T("http://www.mimosa.org/TechXMLV3-0") );
xml.AddElem( _T("mim_0003") );
xml.AddChildElem( _T("connect_req") );

Essentially, what this does in MarkupMSXML.cpp is replace:

MSXMLNS::IXMLDOMElementPtr pNew =
  m_pDOMDoc->createElement( _bstr_t(szName) );

with:

MSXMLNS::IXMLDOMElementPtr pNew;
if ( m_strDefaultNamespace.IsEmpty() )
  pNew = m_pDOMDoc->createElement( _bstr_t(szName) );
else
  pNew = m_pDOMDoc->createNode(
    _variant_t((short)MSXMLNS::NODE_ELEMENT),
    _bstr_t(szName), _bstr_t(m_strDefaultNamespace) );

MSXML tracks namespaces to ensure correctness, but in doing so creates extra work for the developer. The SetDefaultNamespace function has been added to CMarkupMSXML to make it easier. With CMarkup you just set the attributes and tags literally with no intervention (or correctness checking) from the XML tool as shown in the example below.

 

comment posted XML namespaces

David Carroll 28-Jul-2006

While checking KDM import, we found that the new KDM SHA-256 message format includes external namespaces - "ETM" and "KDM". The xml parser is not expecting the added namespaces in the new format. These cannot be resolved in our current system configuration (no external access and no DNS). Can you advise how best to parse both formats with CMarkup. We currently cannot resolve external dtd references from the computer doing the parsing.

Old KDM (SHA-1):

<?xml version="1.0" encoding="utf-8"?>
<DCinemaSecurityMessage xmlns:ds="http://www.w3.org/2000/09/xmldsig#" 
xmlns:enc="http://www.w3.org/2001/04/xmlenc#" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="./KDM.xsd">
....

New KDM (SHA-256):

<?xml version="1.0" encoding="utf-8"?>
<etm:DCinemaSecurityMessage 
xmlns:etm="http://www.smpte-ra.org/schemas/430-3/2006/ETM" 
xmlns:kdm="http://www.smpte-ra.org/schemas/430-1/2006/KDM" 
xmlns:ds="http://www.w3.org/2000/09/xmldsig#" 
xmlns:xenc="http://www.w3.org/2001/04/xmlenc#">
....

You will have to adjust your tag name logic. For example, if the root element is etm:DCinemaSecurityMessage then other elements may be named like etm:UserText (just an example I found online in the SMPTE Standard for Digital Cinema). One way to do this is to test the root tag name at the start and then use the prefix there (if any) from then on. In MFC it might look like this:

xml.ResetPos();
xml.FindElem();
CString csRootTag = xml.GetTagName();
CString csPre; // namespace prefix
int nColon = csRootTag.Find(':');
if ( nColon > 0 )
  csPre = csRootTag.Left( nColon+1 );

then later on you can do tag names like

xml.FindElem( csPre+"UserText" );
xml.FindElem( csPre+"AuthenticatedPublicType" );