Here is a C/C++ function to do ASCII case-insensitive string comparison. You can use it on non-ASCII strings, it will just find them unequal where an ANSI or Unicode aware function would not. For example:
"CAFE" == "cafe" "CAFÉ" != "café"
For char
strings:
#define MCD_CHAR char #define MCD_PCSZ const char*
And for wchar_t
strings:
#define MCD_CHAR wchar_t #define MCD_PCSZ const wchar_t*
In far eastern double byte a lead byte is non-ASCII, so it will work too.
static int x_StrNIACMP( MCD_PCSZ p1, MCD_PCSZ p2, int n ) { bool bNonAsciiFound = false; MCD_CHAR c1, c2; while ( n-- ) { c1 = *p1++; c2 = *p2++; if ( c1 != c2 ) { if ( bNonAsciiFound || ! ( (c1 >= 'a' && c1 <= 'z' && c1 == c2 + ('a'-'A')) || (c2 >= 'a' && c2 <= 'z' && c2 == c1 + ('a'-'A')) ) ) return c1 - c2; } else if ( (unsigned int)c1 > 127 ) bNonAsciiFound = true; } return 0; }
In this function, if a non-ASCII character is encountered, then the strings must be identical (do not ignore case).
Feel free to use it if you need a string compare ignore ASCII case function in C/C++. This is used in CMarkup release 10.0 (in CMarkup 10.1 it is StrNIACmp
in TokenPos
since it is only used by the Match
method).
Do not use x_StrNIACMP
if you need to match strings containing characters in upper and lower case like É and é which are outside of the ASCII range. This requires case tables for different character sets and languages. In Unicode, the case mappings can differ slightly based on the case table version in your operating system (Michael Kaplan has an eye opening post illustrating the complexities of Unicode casing).
The ignorecase string comparison function is not standardized across platforms. Different C++ compilers have different function names such as strncasecmp
_strnicmp
strnicmp
and strncmpi
. In wide char builds too you have these variations: wcsncasecmp
_wcsnicmp
wcsnicmp
and wcsncmpi
(and the same issue occurs with the stricmp
variant without the "n" in it, but CMarkup never used it).
Yes it will work. But ideally you shouldn't be bothered with making this modification.
I spent a long time researching how I could safely determine which function name to use based on compiler defines so for example I could say:
#if defined(LINUX) #define MCD_PSZNICMP strncasecmp #elif defined(_MSC_VER) #define MCD_PSZNICMP _strnicmp #elif...
The good news is that it appears all of the variations take the same arguments and work the same way. But the bad news is that I cannot hope to reliably figure out all of the compilers and versions of compilers. UNIX, LINUX and OS X compilers among others tend to use strncasecmp
, while Visual C++ uses _strnicmp
(or strnicmp
if you link oldnames.lib), and others use strnicmp
and strncmpi
.
But what does CMarkup actually need out of this function anyway?
CMarkup uses an ignorecase string comparison function for two purposes. The first one is to match the encoding name in the XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
CMarkup matches "utf-8" and "UTF-8" with an ignorecase string comparison. The other purpose CMarkup has for this function is to support HTML:
CMarkup html; html.SetDocFlags( CMarkup::MDF_IGNORECASE ); html.Load( "test.htm" ); while ( html.FindElem("//a") ) ...
This will find all of the hyperlink tags whether the HTML file uses upper or lower case "A" tags or a mixture of cases.
Since encoding names and HTML tag names are always ASCII, CMarkup only needs to ignore ASCII case. So instead of traveling any further down the road of trying to use an existing function supplied by the compiler or operating system platform, I simply wrote a small efficient function to do a string compare ignoring ASCII case.
Geoff 22-Jul-2008
Hi, I am evaluating CMarkup on OSX/Xcode. I have had to change the
strnicmp
tostrncasecmp
. Will this work? So far it has compiled after this change.