In this case use ASCII ignorecase

Here is a C/C++ function to do ASCII case-insensitive string comparison. You can use it on non-ASCII strings, it will just find them unequal where an ANSI or Unicode aware function would not. For example:

"CAFE" == "cafe"
"CAFÉ" != "café"

For char strings:

#define MCD_CHAR char
#define MCD_PCSZ const char*

And for wchar_t strings:

#define MCD_CHAR wchar_t
#define MCD_PCSZ const wchar_t*

In far eastern double byte a lead byte is non-ASCII, so it will work too.

static int x_StrNIACMP( MCD_PCSZ p1, MCD_PCSZ p2, int n )
{
  bool bNonAsciiFound = false;
  MCD_CHAR c1, c2;
  while ( n-- )
  {
    c1 = *p1++;
    c2 = *p2++;
    if ( c1 != c2 )
    {
      if ( bNonAsciiFound ||
          ! ( (c1 >= 'a' && c1 <= 'z' && c1 == c2 + ('a'-'A'))
            || (c2 >= 'a' && c2 <= 'z' && c2 == c1 + ('a'-'A')) ) )
        return c1 - c2;
    }
    else if ( (unsigned int)c1 > 127 )
      bNonAsciiFound = true;
  }
  return 0;
}

In this function, if a non-ASCII character is encountered, then the strings must be identical (do not ignore case).

Feel free to use it if you need a string compare ignore ASCII case function in C/C++. This is used in CMarkup release 10.0 (in CMarkup 10.1 it is StrNIACmp in TokenPos since it is only used by the Match method).

Do not use x_StrNIACMP if you need to match strings containing characters in upper and lower case like É and é which are outside of the ASCII range. This requires case tables for different character sets and languages. In Unicode, the case mappings can differ slightly based on the case table version in your operating system (Michael Kaplan has an eye opening post illustrating the complexities of Unicode casing).

What is wrong with strnicmp?

The ignorecase string comparison function is not standardized across platforms. Different C++ compilers have different function names such as strncasecmp _strnicmp strnicmp and strncmpi. In wide char builds too you have these variations: wcsncasecmp _wcsnicmp wcsnicmp and wcsncmpi (and the same issue occurs with the stricmp variant without the "n" in it, but CMarkup never used it).

 

comment posted strncasecmp for CMarkup on OSX/Xcode

Geoff 22-Jul-2008

Hi, I am evaluating CMarkup on OSX/Xcode. I have had to change the strnicmp to strncasecmp. Will this work? So far it has compiled after this change.

Yes it will work. But ideally you shouldn't be bothered with making this modification.

I spent a long time researching how I could safely determine which function name to use based on compiler defines so for example I could say:

#if defined(LINUX)
#define MCD_PSZNICMP strncasecmp
#elif defined(_MSC_VER)
#define MCD_PSZNICMP _strnicmp
#elif...

The good news is that it appears all of the variations take the same arguments and work the same way. But the bad news is that I cannot hope to reliably figure out all of the compilers and versions of compilers. UNIX, LINUX and OS X compilers among others tend to use strncasecmp, while Visual C++ uses _strnicmp (or strnicmp if you link oldnames.lib), and others use strnicmp and strncmpi.

But what does CMarkup actually need out of this function anyway?

When is ASCII ignorecase adequate?

CMarkup uses an ignorecase string comparison function for two purposes. The first one is to match the encoding name in the XML declaration:

<?xml version="1.0" encoding="UTF-8"?>

CMarkup matches "utf-8" and "UTF-8" with an ignorecase string comparison. The other purpose CMarkup has for this function is to support HTML:

CMarkup html;
html.SetDocFlags( CMarkup::MDF_IGNORECASE );
html.Load( "test.htm" );
while ( html.FindElem("//a") )
  ...

This will find all of the hyperlink tags whether the HTML file uses upper or lower case "A" tags or a mixture of cases.

Since encoding names and HTML tag names are always ASCII, CMarkup only needs to ignore ASCII case. So instead of traveling any further down the road of trying to use an existing function supplied by the compiler or operating system platform, I simply wrote a small efficient function to do a string compare ignoring ASCII case.