| ||||||||
HTML And CMarkupSimply put, you can navigate the HTML and find all hyperlink or image elements, or whatever you are looking for. You can also add and remove elements, attributes and content. The following HTML document happens to be a nearly well-formed XML document except for the mismatched case of the P element. <html>
<head>
<title>The Title</title>
</head>
<body>
<P>Hello World</p>
</body>
</html>
To use CMarkup with hand-generated HTML, set the CMarkup html; html.SetDocFlags( CMarkup::MDF_IGNORECASE ); html.Load( "test.htm" ); html.FindElem( "html" ); html.IntoElem(); html.FindElem( "head" ); html.IntoElem(); html.FindElem( "title" ); CString csTitle = html.GetData(); html.SetData( csTitle + " Is Changed" ); html.Save( "test.htm" ); By default, all tag name and attribute name matching is case sensitive in CMarkup (unless you have defined html.SetDocFlags( html.GetDocFlags() | CMarkup::MDF_IGNORECASE ); The SetElemContent method (in CMarkup release 8.0) is great for setting HTML directly into the content of an element such as a paragraph element p. html.AddElem( "p" ); html.SetElemContent( "This small image <br><img src=a.jpg>" ); <p>This small image <br><img src=a.jpg></p>
There are also AddElem and SetData Flags for generating HTML idiosyncrasies explicitly. Attributes without quotes or without values are parsed properly with GetAttrib and GetAttribName, although SetAttrib always generates attributes with quotes. For an attribute without a value, such as the HTML wrap attribute, There is no specific support for HTML in CMarkup, it is just part of Generic Markup In CMarkup. See also the navigation examples in Other Markup for more insight into navigating outside of well-formed XML. CMarkup works best with properly nested HTML elements because improperly nested elements can cause unpredictable results when navigating a document. Remember not to assume the HTML is nested properly just because it displays properly in a browser because browsers use workarounds. See Containment Hierarchy for more on this. Using Paths In CMarkup (in CMarkup Developer only), you can get the title more quickly by calling html.Load( "test.htm" );
while ( html.FindElem("//A") )
{
CString csHref = html.GetAttrib( "href" );
}
|
|
Posted July 12, 2005. Question or comment about this article? ©Copyright 2008 First Objective Software, Inc. All rights reserved. |