Navigating and Getting Information From a Document

FindElem and FindChildElem only search forward. You have to call ResetMainPos or ResetChildPos if you don't know the order of the siblings and/or don't want to process sequentially. For example, you would call ResetChildPos after finding child element CHILD3 before finding child element CHILD1. If you didn't know the order you could call ResetChildPos before every FindChildElem, or loop through all child elements and get each child tag name and process accordingly. Keep in mind that there are no internal lookup tables of tag names; FindChildElem just looks sequentially and CMarkup keeps track of the current child position. So whenever you know the order and have the opportunity you should find and process CHILD1 first since that is fastest.

comment posted GetData to get children?

06-Dec-2002

Wouldn't it be nice if GetData & GetChildData return the children if the element does not have any char data?

The only confusing thing with GetData returning the child elements would be that SetData would not perform the complimentary operation in this case. SetData would treat the tags as special chars and convert them to ampersand codes.

Update July 12, 2005: What you want is the equivalent of .NET's InnerXml which works for children and/or mixed content. CMarkup release 8.0 now has the GetElemContent method to do just this. It returns the XML content and any child elements.

comment posted empty elements?

Jonnie White 01-Aug-2002

I've been using GetData() to find out if an element is empty. Unfortunately, the following code returns data (a CRLF).

CMarkup doc;
CString data;
doc.AddElem("set");
data = doc.GetData();
ASSERT(data.IsEmpty());
doc.AddChildElem("elem");
doc.RemoveChildElem();
data = doc.GetData();
ASSERT(data.IsEmpty());

Would it be possible to return empty elements to their <elem/> format?

Well, CMarkup doesn't necessarily collapse an element back into the empty element <elem/> form. Implementing this might be too tricky and I don't think its a good thing to rely on. The way to check if there are no children is to ResetChildElem and test the result of FindChildElem.

comment posted Navigation question about CMarkup class

yhchen 02-Jan-2002

How can I find the "first" and "next" child element after I call IntoElem()? Is there any function except FindChildElem("xxx")? If there are three child elements (ex. a, b, and c),

<test>
    <a/>
    <b/>
    <c/>
</test>

First I call FindChildElem("b"), next I can call FindChildElem("c"), but now I can't find "a" by calling FindChildElem("a").

After you call IntoElem(), the first call to FindChildElem will find the first child, and subsequent calls to FindChildElem will find the next ones. You can use the following loop to loop through the children without specifying the tagname in FindChildElem:

If you find child c and then want to find a or b, call ResetChildPos(). If you always specify the tag name and do not know the order, then you should always call ResetChildPos() before FindChildElem("tagname"). When you specify the tagname in FindChildElem, it finds the first one with the matching tagname and then it will find the next one with the matching tagname, etc. As in other things, you have to judge what is the most efficient approach for your particular problem.

comment posted iterate thru the nodes in reverse order

Gene 07-Jan-2006

I just need to walk the nodes backwards, that's all. I can do

xml.SetDoc...
xml.FindElem();
xml.IntoElem();
while ( xml.FindElem("MsgItem") )

to walk the nodes from 1st to last, I just need to do it from last to 1st.

Just change FindElem to FindPrevElem. When there is no current main position element, FindPrevElem will start from the last MsgItem element and go to previous ones.

comment posted FindPrevElem not yet implemented?

25-Jun-2006

There does not appear to be a FindPrevElem( ) method in CMarkup (or at least in the evaluation version). This is version 8.2.

comment posted Use of const

Andrew Scheurer 31-Oct-2007

Question: Why is that some of the [navigation] functions are const and some are not... For instance:

GetData is a const function but

FindElem is not a const function...

This is important because I have InputIterator and OutputIterator interfaces where CMarkup is either a const CMarkup& or in the case of OutputIterator CMarkup&. The document is either read (const &) or updated or created (&). Here are the declarations of these methods:

MCD_STR GetData() const { return x_GetData(m_iPos); };
bool FindElem( MCD_CSTR szName=NULL );

FindElem is not const and yet it interfaces with x_FindElem which is const. It also calls x_SetPos which is of course not a const function.

One generally wants to use FindElem() to determine if such an element exists in the document and if so call GetData() but mixing const and non-const makes for difficulty in mixing these abstractions which go together and from appearances are both read-only functions, i.e. they don't update the DOM or at least from the names of the functions one would think they are read-only functions.

I realize why FindElem is not const; it must call a non-const function however if it would be possible to separate CMarkup into read functions and write functions then better use of const& could be used and therefore better C++. I don't know if using the keyword mutable would help. Its generally portable; a lot of compilers going back over a decade supported it and perhaps this could help with some issues such as that above.

From an interface point of view, I have to make all my functions regardless of the Iterator type as non-const references and this is unfortunate. STL gives good examples of the strict use of const to represent read-only vs read-write interfaces.

Thanks for clearly laying out this problem. As you mentioned, FindElem is not const because it does actually change the CMarkup object, since the current position is stored within the CMarkup object. The "navigation" methods were purposefully not called "read" methods because they do modify the object's internally stored position or bookmark, and therefore are often actually "write" methods. I have actually begun to move away from the original grouping of "navigation" methods (see CMarkup Methods) partly because of this potential confusion.

Using the mutable storage class specifier is a very resourceful idea and creative solution but ultimately I cannot use it because strictly speaking if you pass a CMarkup object as a constant reference you expect that it's internal position will not be modified. The mutable keyword is a way of fudging on that for the sake of a higher concept of "read" and "write," but in this case would serve to hide or disguise what might really be happening in the called function.

I have given this issue of internalized position a lot of thought since the very beginning of CMarkup and it certainly was a debatable design choice with many ramifications, including some unfortunate issues in the design of FOAL, but with the chief advantage that there is only one single class. Many times I have considered integrating a struct to contain the position, allowing you to optionally navigate without modifying the CMarkup object by instantiating a separate position struct. This will add complexity which is a trade off. I have not seen a way to cleanly integrate this into the existing API and so I have not found a way to release this functionality as part of the product yet.