RSS: Create and Read Feeds With CMarkup

An RSS feed is an XML format with a list of items used by News Readers and Aggregators. You can create your own feed by creating an XML file and putting it on your website. News Readers simply check your file periodically to see if there are any new items in it. CMarkup can easily create RSS documents and parse existing ones (but note that CMarkup does not provide Internet access to download feeds from the web or upload your feed to your website).

There are different versions of RSS and there is also a format called Atom. But the principle remains the same, the XML document has a list of items, each with a title, a link, and a description.

Here is a sample feed that any News Reader can read. The root element tag name is rss and it has a version attribute with the value 2.0. Inside the root element is a channel element which contains elements with details about the feed and a variable number of item elements. This document has been formatted for ease of viewing with the contents of the channel in three groups. The first group is all of the feed details such as the title of the feed, and the other two are item elements.

<rss version="2.0">
  <channel>

    <title>News from firstobject.com</title>
    <link>http://www.firstobject.com/dn_news.xml</link>
    <description>occassional firstobject.com news</description>
    <language>en-us</language>
    <lastBuildDate>21 Dec 2005 19:43:23 -0500</lastBuildDate>
    <ttl>180</ttl>

    <item>
      <title>Subdocuments and Fragments of XML Documents</title>
      <link>http://www.firstobject.com/dn_markfragments.htm</link>
      <guid isPermaLink="false">firstobject.com/dn_markfragments.htm</guid>
      <pubDate>03 Dec 2005 09:00:00 -0500</pubDate>
      <description>This article discusses the terminology of document
        fragments and the details of CMarkup's support for them. A subdocument
        is an element with its attributes and all its content as a unit, even
        if the element contains a whole tree of elements</description>
    </item>

    <item>
      <title>CMudCtrl Class</title>
      <link>http://www.firstobject.com/dn_mudctrl.htm</link>
      <guid isPermaLink="false">firstobject.com/dn_mudctrl.htm</guid>
      <pubDate>18 Nov 2005 09:00:00 -0500</pubDate>
      <description>The CMudCtrl class is a standalone MFC control derived
        directly from CWnd for displaying markup enhanced UTF-8 text similar
        to an HTML control</description>
    </item>

  </channel>
</rss>

This format is mostly self-explanatory; you can substitute your own information without learning any more about it. I am going to give a couple of details so please bypass this paragraph if you just want to get on with it. The timestamps shown in lastBuildDate and pubDate should be in RFC 822 e-mail format as shown (you can include an optional weekday at the beginning like: "Sat, 03 Dec 2005 09:00:00 -0500"). The ttl "time to live" element is an optional minimum number of minutes for a News Reader to wait between refreshes. The item's guid is the globally unique identifier string (which may or may not be a URL) that helps a News Reader to determine that the item is new. See the Harvard Law RSS 2.0 Specification for details but note that alternative and additional elements are allowed such as the dc:date element with ISO 8601 format e.g. "2005-12-03T09:00:00-05:00". See Mark Pilgrim's History of RSS Date Formats to get a glimpse of the full anarchy that is XML on the web.

You can get CMarkup code for creating this (or any XML document) in the free firstobject XML Editor by right clicking on the root element and selecting Creation Code. Assuming you have an array of item structures in memory called aItems, you could use the following code to generate your RSS document to a file called feed.xml:

CMarkup xml;
xml.AddElem( "rss" );
xml.SetAttrib( "version", "2.0" );
xml.IntoElem();
xml.AddElem( "channel" );
xml.IntoElem();
xml.AddElem( "title", "News from firstobject.com" );
xml.AddElem( "link", "http://www.firstobject.com/dn_news.xml" );
xml.AddElem( "description", "occassional firstobject.com news" );
xml.AddElem( "language", "en-us" );
xml.AddElem( "lastBuildDate", strTimestampNow );
xml.AddElem( "ttl", "180" );
for ( int nItem=0; nItem<aItems.GetSize(); ++nItem )
{
  xml.AddElem( "item" );
  xml.AddChildElem( "title", aItems[nItem].strTitle );
  xml.AddChildElem( "link", aItems[nItem].strURL );
  xml.AddChildElem( "guid", aItems[nItem].strGUID );
  xml.SetChildAttrib( "isPermaLink", "false" );
  xml.AddChildElem( "pubDate", aItems[nItem].strTimestamp );
  xml.AddChildElem( "description", aItems[nItem].strDesc );
}
xml.Save( "feed.xml" );

That's how easy it is! There are 4 types of feeds in common use: RSS 0.91, 1.0, 2.0 and Atom. I chose RSS 2.0 for the example above because it is simpler to look at than RSS 1.0 or Atom, and it is an easier version number to remember than RSS 0.91. The format you choose to generate isn't very important unless you have information requirements supported by a particular format.

If you are processing a feed, the first step is generally to identify what flavor of feed you are dealing with. Of course you can skip this step if you already know what feed format it is. Here is some sample code for loading and determining the type of a file called feed.xml:

enum FeedType
{
  FT_UNKNOWN,
  FT_RSS10,
  FT_RSS091,
  FT_RSS20,
  FT_ATOM
};
xml.Load( "feed.xml" );
xml.FindElem();
CString csTag = xml.GetTagName();
int nFeedType = FT_UNKNOWN;
if ( csTag == "rss" )
{
  if ( xml.GetAttrib("version") == "2.0" )
    nFeedType = FT_RSS20;
  else
    nFeedType = FT_RSS091;
}
else if ( csTag == "rdf:RDF" )
  nFeedType = FT_RSS10;
else if ( csTag == "feed" )
  m_nFeedType = FT_ATOM;

In practice, the variations between versions are dwarfed only by the variations within each version! News Readers are much like Internet browsers in that they have learned to deal with numerous variations and even incorrect practices in feed formats. But knowing the version generally tells you where to find the items and what kinds of information you can look for.

Format Location of Feed Title Location of Items
RSS 0.91 /rss/channel/title /rss/channel/item
RSS 1.0 /rdf:RDF/channel/title /rdf:RDF/item
RSS 2.0 /rss/channel/title /rss/channel/item
Atom /feed/title /feed/entry

Format	Location of Feed Title	Location of Items
RSS 0.91	/rss/channel/title	/rss/channel/item
RSS 1.0	/rdf:RDF/channel/title	/rdf:RDF/item
RSS 2.0	/rss/channel/title	/rss/channel/item
Atom	/feed/title	/feed/entry

Notice that 0.91 and 2.0 are the same in this regard (RSS 2.0 is based on 0.9x). Also notice that in RSS 1.0 the title is inside the channel but the items are not. Having a channel element may seem to imply that there can be multiple channels in an RSS document but this is not the case, it is just an artifact of the original concept behind RSS.

Here is some code to process an RSS 2.0 feed to grab the items into an array called aItems. To show how you can remain flexible, it demonstrates pulling the timestamp from either pubDate or dc:date.

xml.ResetPos();
xml.FindElem( "rss" );
xml.IntoElem();
xml.FindElem( "channel" );
xml.IntoElem();
xml.FindElem( "title" );
CString csFeedTitle = xml.GetData();
while ( xml.FindElem("item") )
{
  xml.FindChildElem( "title" );
  item.strTitle = xml.GetChildData();
  xml.ResetChildPos();
  xml.FindChildElem( "link" );
  item.strURL = xml.GetChildData();
  xml.ResetChildPos();
  xml.FindChildElem( "description" );
  item.strDesc = xml.GetChildData();
  xml.ResetChildPos();
  xml.FindChildElem( "guid" );
  item.strGUID = xml.GetChildData();
  xml.ResetChildPos();
  if ( ! xml.FindChildElem( "pubDate" ) )
  {
    item.bIso8601 = true;
    xml.FindChildElem( "dc:date" );
  }
  item.strTimestamp = xml.GetChildData());
  aItems.Add( item );
}

Here is an example of an acceptable RSS 2.0 item with some differences from the items in the example at the top of this article, but which can still be processed by the same code.

    <item>
      <author>Ben Bryant</author>
      <dc:date>2005-12-03T09:00:00-05:00</dc:date>
      <title>Subdocuments and Fragments of XML Documents</title>
      <description><![CDATA[This article discusses the terminology of document
            fragments and the details of CMarkup's support for them. A subdocument
            is an element with its attributes and all its content as a unit, even
            if the element contains a whole tree of elements]]></description>
      <slash:comments>1</slash:comments>
      <link>http://www.firstobject.com/dn_markfragments.htm</link>
    </item>

The order of the elements is different which is why we called ResetChildPos between FindChildElem calls. Some feeds provide the description in a CDATA Section and GetChildData will still get just the text content. The number of comments on an article may be provided in the slash:comments element. The author element in the RSS 2.0 item is supposed to contain the e-mail address such as <author>lawyer@example.com (Lawyer Boyer)</author> but in practice you will see just the name when they don't want to supply the e-mail address. In Atom the author has separate subelements for name and email (see the Atom Syndication Format RFC 4287).

The source code of the firstobject News Reader, available to those who have purchased an Advanced CMarkup Developer License, contains code in the NewsDlg.cpp source file for processing all of the common feed formats and more. But the above tips are enough to read and write feeds with the Evaluation version of CMarkup.