Multithreading and CMarkup

CMarkup can be used in multithreaded applications easily. Since there are no static members, you can always use CMarkup objects in different threads simultaneously if no two threads look at the same CMarkup object. There is nothing to stop you from using as many CMarkup objects in as many threads as you like. The simple rule is that if multiple threads access a single instance of CMarkup (i.e. a single CMarkup object) you must ensure mutual exclusion.

The CMarkup Twist

Unlike most data container classes, CMarkup maintains the current position internally. This means that the object itself actually changes whether you are reading or writing. That may not seem intuitive at first but think about when you are navigating to a certain element in a document with the FindElem method. If that element is found, the CMarkup object keeps track of that position in the document so that a subsequent call to GetAttrib knows from which element you are getting the attribute. Now you haven't changed the document itself like you do with a write (modification) operation like SetAttrib, but you have changed the internally maintained current position. So, if two threads attempt to read (navigate) through the same object at the same time they will inevitably have a conflict.

So, the developer must ensure multi-threaded access to a single CMarkup object is mutually exclusive. In other words, one thread must complete any reading or writing task before another thread is allowed to begin a task on that same object. If one thread is reading JOB1 elements from a document and even though the other thread is only interested in JOB2 elements, still only one thread can be reading that document at a time.

There are two typical scenarios for synchronizing access to a shared object: the notification scenario and the mutex scenario.

The Notification Scenario

Think of a team sport where only one person has the ball at a time, and one player passes it to another. In the notification scenario, one thread finishes accessing the shared resource and notifies the next thread, so the next thread can access it without fear of conflicting with the other thread.

You do not need to use a mutex if you can guarantee two threads won't attempt to access the shared CMarkup object at the same time. For example, I use CMarkup in a multi-threaded application in which a background worker thread (CThread Class) is used to fill an XML document from a data source. The worker thread then notifies the main thread which populates a grid from the same CMarkup object. Having been notified that the worker thread is finished, the main thread can then freely access the populated CMarkup object.

To notify the main thread, the worker thread can use PostMessage to the main dialog window (the worker thread must avoid all calls such as SendMessage that directly call windows). Or the worker thread can set a shared flag (integer) variable that the main thread checks periodically.

But if the main thread wants to "look in on" the data content while it is being loaded by the worker thread, then it no longer fits the notification scenario and you probably need a mutex.

The Mutex Scenario

If you have multiple threads needing concurrent access to the shared resource, you can use a shared mutex to synchronize it. The firstobject News Reader demonstrates this approach where it uses an XML document containing news feeds like an in-memory database accessed by multiple threads.

Here's how a thread can safely grab a value from the document. First declare your mutex where every thread can get to it, likely in the same place as your shared CMarkup object.

CMarkup m_xmlDB;
CMutex m_mutexDB;

Then, lock the mutex before accessing the shared CMarkup object, and unlock it afterwards.

m_mutexDB.Lock();
m_xmlDB.FindElem( "/*/CLOB1" ); // find
CString csClob = m_xmlDB.GetData(); // retrieve
m_mutexDB.Unlock();

It is that easy! In general, you need to package multiple steps into one logical action using a mutex. The steps that you will bundle into one action on the document depend on your operation. For example, if you want to check the value of something in the document and modify it according to that value, those are multiple CMarkup method calls that need to be done with confidence that no other thread will modify the relevant values in the meantime. So you do those multiple CMarkup method calls as one single action bracketed by a mutex lock and unlock.

Sometimes a single action can involve traversing the whole document, but if it becomes too long you have to design a way to break it down into smaller actions so that other threads can continue with their actions.

Here are some tips for avoiding deadlocks and corruption.

Use one shared mutex for the whole shared CMarkup object.
When the mutex is locked, do not wait for actions of any other threads. Plan to "get in and get out."
Complete your whole logical action in one go; i.e. design discreet independent logical actions.
For every mutex lock, make sure all procedure logic paths lead to a corresponding unlock.
Between any two separate actions, expect any of the other possible actions to take place.

These tips are the same for any multithreaded application involving the mutex scenario and a shared complex object.