Here are all the best (official and most commonly accepted) character set names (labels/identifiers) to use in your XML declaration encoding or HTML content-type charset, plus the aliases, Windows code pages and descriptive titles.
Looking around the Internet I could not find a comprehensive character set name reference so I combined several online resources into this one. The primary reference for web technologies like XML is the IANA list of character sets but it:
It might be reasonable to list duplicate and ambiguous aliases if someone is trying to interpret the intention of an obscure alias in their data, but here I chose to remove all duplicates. I also made some choices (which will remain controversial) as follows:
Also, where the Windows code page was available I have used it as a basis for associating aliases that identify the same charset. Hopefully this has not led to grouping any unequal charsets together.
Descriptive Title | Windows Code Page |
---|---|
Charset names (preferred name in bold) | |
Adobe-Standard-Encoding | |
Adobe-Standard-Encoding, csAdobeStandardEncoding | |
Adobe-Symbol-Encoding | |
Adobe-Symbol-Encoding, csHPPSMath | |
Amiga-1251 | |
Amiga-1251, Ami1251, Amiga1251, Ami-1251 | |
ANSI_X3.110-1983 | |
ANSI_X3.110-1983, iso-ir-99, CSA_T500-1983, NAPLPS, csISO99NAPLPS | |
Arabic (864) | 864 |
IBM864, cp864, csIBM864 | |
Arabic (ASMO 708) | 708 |
ASMO-708 | |
Arabic (DOS) | 720 |
DOS-720 | |
Arabic (ISO) | 28596 |
iso-8859-6, arabic, csISOLatinArabic, ECMA-114, ISO_8859-6, ISO_8859-6:1987, iso-ir-127, iso8859-6 | |
Arabic (Mac) | 10004 |
x-mac-arabic | |
Arabic (Windows) | 1256 |
windows-1256, cp1256 , cp1256, MS-ARAB | |
ASMO_449 | |
ASMO_449, ISO_9036, arabic7, iso-ir-89, csISO89ASMO449 | |
Baltic (DOS) | 775 |
ibm775, CP500, ebcdic-cp-be, ebcdic-cp-ch, csIBM500, cp775, csPC775Baltic | |
Baltic (ISO) | 28594 |
iso-8859-4, csISOLatin4, ISO_8859-4, ISO_8859-4:1988, iso-ir-110, l4, latin4, iso8859-4 | |
Baltic (Windows) | 1257 |
windows-1257, CP1257, WINBALTRIM | |
BOCU-1 | |
BOCU-1, csBOCU-1 | |
BRF | |
BRF, csBRF | |
BS_4730 | |
BS_4730, iso-ir-4, ISO646-GB, gb, uk, csISO4UnitedKingdom | |
BS_viewdata | |
BS_viewdata, iso-ir-47, csISO47BSViewdata | |
Central European (DOS) | 852 |
ibm852, cp852, 852, csPCp852 | |
Central European (ISO) | 28592 |
iso-8859-2, csISOLatin2, iso_8859-2, iso_8859-2:1987, iso8859-2, iso-ir-101, l2, latin2 | |
Central European (Mac) | 10029 |
x-mac-ce | |
Central European (Windows) | 1250 |
windows-1250, x-cp1250, CP1250, MS-EE | |
CESU-8 | |
CESU-8, csCESU-8 | |
Chinese National Standard (GB18030) | 54936 |
GB18030, ISO-4873:1986 | |
Chinese Simplified (EUC) | 51936 |
EUC-CN, x-euc-cn | |
Chinese Simplified (GB2312) | 936 |
gb2312, chinese, CN-GB, csGB2312, csGB231280, csISO58GB231280, GB_2312-80, GB231280, GB2312-80, GBK, iso-ir-58, CP936, MS936, windows-936 | |
Chinese Simplified (GB2312-80) | 20936 |
x-cp20936 | |
Chinese Simplified (HZ) | 52936 |
hz-gb-2312 | |
Chinese Simplified (ISO 2022) | 50227 |
x-cp50227 | |
Chinese Simplified (Mac) | 10008 |
x-mac-chinesesimp | |
Chinese Traditional (Big5) | 950 |
big5, cn-big5, csbig5, x-x-big5, CP950, Big5-HKSCS | |
Chinese Traditional (CNS) | 20000 |
x-Chinese-CNS, x-Chinese_CNS | |
Chinese Traditional (Eten) | 20002 |
x-Chinese-Eten, x_Chinese-Eten | |
Chinese Traditional (Mac) | 10002 |
x-mac-chinesetrad | |
CP1125 | 1125 |
CP1125 | |
CP1133 | 1133 |
CP1133, IBM-CP1133 | |
CP853 | 853 |
CP853 | |
Croatian (Mac) | 10082 |
x-mac-croatian | |
CSA_Z243.4-1985-1 | |
CSA_Z243.4-1985-1, iso-ir-121, ISO646-CA, csa7-1, ca, csISO121Canadian1 | |
CSA_Z243.4-1985-2 | |
CSA_Z243.4-1985-2, iso-ir-122, ISO646-CA2, csa7-2, csISO122Canadian2 | |
CSA_Z243.4-1985-gr | |
CSA_Z243.4-1985-gr, iso-ir-123, csISO123CSAZ24341985gr | |
CSN_369103 | |
CSN_369103, iso-ir-139, csISO139CSN369103 | |
Cyrillic (DOS) | 866 |
cp866, ibm866, 866, csIBM866 | |
Cyrillic (ISO) | 28595 |
iso-8859-5, csISOLatin5, csISOLatinCyrillic, cyrillic, ISO_8859-5, ISO_8859-5:1988, iso-ir-144, iso8859-5 | |
Cyrillic (KOI8-R) | 20866 |
koi8-r, csKOI8R, koi, koi8, koi8r | |
Cyrillic (KOI8-U) | 21866 |
koi8-u, koi8-ru | |
Cyrillic (Mac) | 10007 |
x-mac-cyrillic | |
Cyrillic (Windows) | 1251 |
windows-1251, x-cp1251, CP1251, MS-CYRL | |
DEC-MCS | |
DEC-MCS, dec, csDECMCS | |
DIN_66003 | |
DIN_66003, iso-ir-21, de, ISO646-DE, csISO21German | |
dk-us | |
dk-us, csDKUS | |
DS_2089 | |
DS_2089, DS2089, ISO646-DK, dk, csISO646Danish | |
EBCDIC-AT-DE | |
EBCDIC-AT-DE, csIBMEBCDICATDE | |
EBCDIC-AT-DE-A | |
EBCDIC-AT-DE-A, csEBCDICATDEA | |
EBCDIC-CA-FR | |
EBCDIC-CA-FR, csEBCDICCAFR | |
EBCDIC-DK-NO | |
EBCDIC-DK-NO, csEBCDICDKNO | |
EBCDIC-DK-NO-A | |
EBCDIC-DK-NO-A, csEBCDICDKNOA | |
EBCDIC-ES | |
EBCDIC-ES, csEBCDICES | |
EBCDIC-ES-A | |
EBCDIC-ES-A, csEBCDICESA | |
EBCDIC-ES-S | |
EBCDIC-ES-S, csEBCDICESS | |
EBCDIC-FI-SE | |
EBCDIC-FI-SE, csEBCDICFISE | |
EBCDIC-FI-SE-A | |
EBCDIC-FI-SE-A, csEBCDICFISEA | |
EBCDIC-FR | |
EBCDIC-FR, csEBCDICFR | |
EBCDIC-IT | |
EBCDIC-IT, csEBCDICIT | |
EBCDIC-PT | |
EBCDIC-PT, csEBCDICPT | |
EBCDIC-UK | |
EBCDIC-UK, csEBCDICUK | |
EBCDIC-US | |
EBCDIC-US, csEBCDICUS | |
ECMA-cyrillic | |
ECMA-cyrillic, iso-ir-111, KOI8-E, csISO111ECMACyrillic | |
ES | |
ES, iso-ir-17, ISO646-ES, csISO17Spanish | |
ES2 | |
ES2, iso-ir-85, ISO646-ES2, csISO85Spanish2 | |
Europa | 29001 |
x-Europa | |
Extended_UNIX_Code_Fixed_Width_for_Japanese | |
Extended_UNIX_Code_Fixed_Width_for_Japanese, csEUCFixWidJapanese | |
French Canadian (DOS) | 863 |
IBM863, cp863, 863, csIBM863 | |
GB_1988-80 | |
GB_1988-80, iso-ir-57, cn, ISO646-CN, csISO57GB1988 | |
German (IA5) | 20106 |
x-IA5-German | |
GOST_19768-74 | |
GOST_19768-74, ST_SEV_358-88, iso-ir-153, csISO153GOST1976874 | |
Greek (DOS) | 737 |
ibm737, CP737 | |
Greek (ISO) | 28597 |
iso-8859-7, csISOLatinGreek, ECMA-118, ELOT_928, greek, greek8, ISO_8859-7, ISO_8859-7:1987, iso-ir-126, iso8859-7 | |
Greek (Mac) | 10006 |
x-mac-greek | |
Greek (Windows) | 1253 |
windows-1253, CP1253, MS-GREEK | |
Greek, Modern (DOS) | 869 |
ibm869, cp869, 869, cp-gr, csIBM869 | |
greek-ccitt | |
greek-ccitt, iso-ir-150, csISO150, csISO150GreekCCITT | |
greek7 | |
greek7, iso-ir-88, csISO88Greek7 | |
greek7-old | |
greek7-old, iso-ir-18, csISO18Greek7Old | |
Hebrew (DOS) | 862 |
DOS-862, IBM862, cp862, 862, csPC862LatinHebrew | |
Hebrew (ISO-Logical) | 38598 |
iso-8859-8-i, logical, iso8859-8-i | |
Hebrew (ISO-Visual) | 28598 |
iso-8859-8, csISOLatinHebrew, hebrew, ISO_8859-8, ISO_8859-8:1988, iso-ir-138, visual, iso8859-8 | |
Hebrew (Mac) | 10005 |
x-mac-hebrew | |
Hebrew (Windows) | 1255 |
windows-1255, CP1255, MS-HEBR | |
HP-DeskTop | |
HP-DeskTop, csHPDesktop | |
HP-Legal | |
HP-Legal, csHPLegal | |
HP-Math8 | |
HP-Math8, csHPMath8 | |
HP-Pi-font | |
HP-Pi-font, csHPPiFont | |
hp-roman8 | |
hp-roman8, roman8, r8, csHPRoman8 | |
IBM EBCDIC (Arabic) | 420 |
x-EBCDIC-Arabic | |
IBM EBCDIC (Cyrillic Serbian-Bulgarian) | 21025 |
x-EBCDIC-CyrillicSerbianBulgarian, cp1025 | |
IBM EBCDIC (Denmark-Norway-Euro) | 1142 |
x-ebcdic-denmarknorway-euro, IBM01142, CCSID01142, CP01142, ebcdic-dk-277+euro, ebcdic-no-277+euro | |
IBM EBCDIC (Finland-Sweden-Euro) | 1143 |
x-ebcdic-finlandsweden-euro, X-EBCDIC-France, IBM01143, CCSID01143, CP01143, ebcdic-fi-278+euro, ebcdic-se-278+euro | |
IBM EBCDIC (France-Euro) | 1147 |
x-ebcdic-france-euro, IBM01147, CCSID01147, CP01147, ebcdic-fr-297+euro | |
IBM EBCDIC (Germany-Euro) | 1141 |
x-ebcdic-germany-euro, IBM01141, CCSID01141, CP01141, ebcdic-de-273+euro | |
IBM EBCDIC (Greek Modern) | 875 |
x-EBCDIC-GreekModern, cp875 | |
IBM EBCDIC (Icelandic-Euro) | 1149 |
x-ebcdic-icelandic-euro, IBM01149, CCSID01149, CP01149, ebcdic-is-871+euro | |
IBM EBCDIC (International-Euro) | 1148 |
x-ebcdic-international-euro, IBM01148, CCSID01148, CP01148, ebcdic-international-500+euro | |
IBM EBCDIC (Italy-Euro) | 1144 |
x-ebcdic-italy-euro, IBM01144, CCSID01144, CP01144, ebcdic-it-280+euro | |
IBM EBCDIC (Japanese and Japanese Katakana) | 50930 |
x-EBCDIC-JapaneseAndKana | |
IBM EBCDIC (Japanese and Japanese-Latin) | 50939 |
x-EBCDIC-JapaneseAndJapaneseLatin | |
IBM EBCDIC (Japanese and US-Canada) | 50931 |
x-EBCDIC-JapaneseAndUSCanada | |
IBM EBCDIC (Korean and Korean Extended) | 50933 |
x-EBCDIC-KoreanAndKoreanExtended | |
IBM EBCDIC (Korean Extended) | 20833 |
x-EBCDIC-KoreanExtended | |
IBM EBCDIC (Multilingual Latin-2) | 870 |
CP870, ebcdic-cp-roece, ebcdic-cp-yu, csIBM870, IBM870 | |
IBM EBCDIC (Simplified Chinese) | 50935 |
x-EBCDIC-SimplifiedChinese | |
IBM EBCDIC (Spain-Euro) | 1145 |
x-ebcdic-spain-euro, IBM01145, CCSID01145, CP01145, ebcdic-es-284+euro | |
IBM EBCDIC (Traditional Chinese) | 50937 |
x-EBCDIC-TraditionalChinese | |
IBM EBCDIC (Turkish Latin-5) | 1026 |
CP1026, csIBM1026, IBM1026 | |
IBM EBCDIC (UK-Euro) | 1146 |
x-ebcdic-uk-euro, IBM01146, CCSID01146, CP01146, ebcdic-gb-285+euro | |
IBM EBCDIC (US-Canada) | 37 |
ebcdic-cp-us, ebcdic-cp-ca, ebcdic-cp-wt, ebcdic-cp-nl, csIBM037, IBM037, cp037 | |
IBM EBCDIC (US-Canada-Euro) | 1140 |
x-ebcdic-cp-us-euro, IBM01140, CCSID01140, CP01140, ebcdic-us-37+euro | |
IBM EBCDIC Arabic | 20420 |
IBM420, cp420, ebcdic-cp-ar1, csIBM420 | |
IBM EBCDIC Cyrillic Russian | 20880 |
x-EBCDIC-CyrillicRussian, IBM880, cp880, EBCDIC-Cyrillic, csIBM880 | |
IBM EBCDIC Denmark-Norway | 20277 |
x-EBCDIC-DenmarkNorway, IBM277, EBCDIC-CP-DK, EBCDIC-CP-NO, csIBM277 | |
IBM EBCDIC Finland-Sweden | 20278 |
x-EBCDIC-FinlandSweden, IBM278, CP278, ebcdic-cp-fi, ebcdic-cp-se, csIBM278 | |
IBM EBCDIC France | 20297 |
IBM297, cp297, ebcdic-cp-fr, csIBM297 | |
IBM EBCDIC Germany | 20273 |
x-EBCDIC-Germany, IBM273, CP273, csIBM273 | |
IBM EBCDIC Greek | 20423 |
x-EBCDIC-Greek, IBM423, cp423, ebcdic-cp-gr, csIBM423 | |
IBM EBCDIC Hebrew | 20424 |
x-EBCDIC-Hebrew, IBM424, cp424, ebcdic-cp-he, csIBM424 | |
IBM EBCDIC Icelandic | 20871 |
x-EBCDIC-Icelandic, IBM871, CP871, ebcdic-cp-is, csIBM871 | |
IBM EBCDIC International | 500 |
IBM500 | |
IBM EBCDIC Italy | 20280 |
x-EBCDIC-Italy, IBM280, CP280, ebcdic-cp-it, csIBM280 | |
IBM EBCDIC Japanese Katakana Extended | 20290 |
x-EBCDIC-JapaneseKatakana, IBM290, cp290, EBCDIC-JP-kana, csIBM290 | |
IBM EBCDIC Latin 1/Open System | 1047 |
IBM01047 | |
IBM EBCDIC Latin 1/Open System (1047 + Euro symbol) | 20924 |
IBM00924, CCSID00924, CP00924, ebcdic-Latin9--euro | |
IBM EBCDIC Latin America-Spain | 20284 |
X-EBCDIC-Spain, IBM284, CP284, ebcdic-cp-es, csIBM284 | |
IBM EBCDIC Thai | 20838 |
x-EBCDIC-Thai, IBM-Thai, csIBMThai | |
IBM EBCDIC Turkish | 20905 |
x-EBCDIC-Turkish, IBM905, CP905, ebcdic-cp-tr, csIBM905 | |
IBM EBCDIC United Kingdom | 20285 |
x-EBCDIC-UK, IBM285, CP285, ebcdic-cp-gb, csIBM285 | |
IBM-Symbols | |
IBM-Symbols, csIBMSymbols | |
IBM038 | |
IBM038, EBCDIC-INT, cp038, csIBM038 | |
IBM1047 | |
IBM1047, IBM-1047 | |
IBM274 | |
IBM274, EBCDIC-BE, CP274, csIBM274 | |
IBM275 | |
IBM275, EBCDIC-BR, cp275, csIBM275 | |
IBM281 | |
IBM281, EBCDIC-JP-E, cp281, csIBM281 | |
IBM5550 Taiwan | 20003 |
x-cp20003 | |
IBM851 | |
IBM851, cp851, 851, csIBM851 | |
IBM868 | |
IBM868, CP868, cp-ar, csIBM868 | |
IBM891 | |
IBM891, cp891, csIBM891 | |
IBM903 | |
IBM903, cp903, csIBM903 | |
IBM904 | |
IBM904, cp904, 904, csIBBM904 | |
IBM918 | |
IBM918, CP918, ebcdic-cp-ar2, csIBM918 | |
Icelandic (DOS) | 861 |
ibm861, cp861, 861, cp-is, csIBM861 | |
Icelandic (Mac) | 10079 |
x-mac-icelandic | |
IEC_P27-1 | |
IEC_P27-1, iso-ir-143, csISO143IECP271 | |
INIS | |
INIS, iso-ir-49, csISO49INIS | |
INIS-8 | |
INIS-8, iso-ir-50, csISO50INIS8 | |
INIS-cyrillic | |
INIS-cyrillic, iso-ir-51, csISO51INISCyrillic | |
INVARIANT | |
INVARIANT, csINVARIANT | |
ISCII Assamese | 57006 |
x-iscii-as | |
ISCII Bengali | 57003 |
x-iscii-be | |
ISCII Devanagari | 57002 |
x-iscii-de | |
ISCII Gujarathi | 57010 |
x-iscii-gu | |
ISCII Kannada | 57008 |
x-iscii-ka | |
ISCII Malayalam | 57009 |
x-iscii-ma | |
ISCII Oriya | 57007 |
x-iscii-or | |
ISCII Punjabi | 57011 |
x-iscii-pa | |
ISCII Tamil | 57004 |
x-iscii-ta | |
ISCII Telugu | 57005 |
x-iscii-te | |
ISO 6937 Non-Spacing Accent | 20269 |
x-cp20269 | |
ISO 8859-13 Estonian | 28603 |
ISO-8859-13, iso8859-13 | |
ISO-10646-J-1 | |
ISO-10646-J-1 | |
ISO-10646-UCS-2 | |
ISO-10646-UCS-2, csUnicode | |
ISO-10646-UCS-4 | |
ISO-10646-UCS-4, csUCS4 | |
ISO-10646-UCS-Basic | |
ISO-10646-UCS-Basic, csUnicodeASCII | |
ISO-10646-Unicode-Latin1 | |
ISO-10646-Unicode-Latin1, csUnicodeLatin1, ISO-10646 | |
ISO-10646-UTF-1 | |
ISO-10646-UTF-1, csISO10646UTF1 | |
ISO-11548-1 | |
ISO-11548-1, ISO_11548-1, ISO_TR_11548-1, csISO115481 | |
ISO-2022-CN | |
ISO-2022-CN | |
ISO-2022-CN-EXT | |
ISO-2022-CN-EXT | |
ISO-2022-JP-2 | |
ISO-2022-JP-2, csISO2022JP2 | |
ISO-8859-1-Windows-3.0-Latin-1 | |
ISO-8859-1-Windows-3.0-Latin-1, csWindows30Latin1 | |
ISO-8859-1-Windows-3.1-Latin-1 | |
ISO-8859-1-Windows-3.1-Latin-1, csWindows31Latin1 | |
ISO-8859-10 | |
ISO-8859-10, iso-ir-157, l6, ISO_8859-10:1992, csISOLatin6, latin6 | |
ISO-8859-14 | |
ISO-8859-14, iso-ir-199, ISO_8859-14:1998, ISO_8859-14, latin8, iso-celtic, l8 | |
ISO-8859-16 | |
ISO-8859-16, iso-ir-226, ISO_8859-16:2001, ISO_8859-16, latin10, l10 | |
ISO-8859-2-Windows-Latin-2 | |
ISO-8859-2-Windows-Latin-2, csWindows31Latin2 | |
ISO-8859-6-E | |
ISO-8859-6-E, ISO_8859-6-E, csISO88596E | |
ISO-8859-6-I | |
ISO-8859-6-I, ISO_8859-6-I, csISO88596I | |
ISO-8859-8-E | |
ISO-8859-8-E, ISO_8859-8-E, csISO88598E | |
ISO-8859-9-Windows-Latin-5 | |
ISO-8859-9-Windows-Latin-5, csWindows31Latin5 | |
iso-ir-90 | |
iso-ir-90, csISO90 | |
ISO-Unicode-IBM-1261 | |
ISO-Unicode-IBM-1261, csUnicodeIBM1261 | |
ISO-Unicode-IBM-1264 | |
ISO-Unicode-IBM-1264, csUnicodeIBM1264 | |
ISO-Unicode-IBM-1265 | |
ISO-Unicode-IBM-1265, csUnicodeIBM1265 | |
ISO-Unicode-IBM-1268 | |
ISO-Unicode-IBM-1268, csUnicodeIBM1268 | |
ISO-Unicode-IBM-1276 | |
ISO-Unicode-IBM-1276, csUnicodeIBM1276 | |
ISO_10367-box | |
ISO_10367-box, iso-ir-155, csISO10367Box | |
ISO_2033-1983 | |
ISO_2033-1983, iso-ir-98, e13b, csISO2033 | |
ISO_5427 | |
ISO_5427, iso-ir-37, csISO5427Cyrillic | |
ISO_5427:1981 | |
ISO_5427:1981, iso-ir-54, ISO5427Cyrillic1981 | |
ISO_5428:1980 | |
ISO_5428:1980, iso-ir-55, csISO5428Greek | |
ISO_646.basic:1983 | |
ISO_646.basic:1983, ref, csISO646basic1983 | |
ISO_646.irv:1983 | |
ISO_646.irv:1983, iso-ir-2, irv, csISO2IntlRefVersion | |
ISO_6937-2-25 | |
ISO_6937-2-25, iso-ir-152, csISO6937Add | |
ISO_6937-2-add | |
ISO_6937-2-add, iso-ir-142, csISOTextComm | |
ISO_8859-8-I | |
ISO_8859-8-I, csISO88598I | |
ISO_8859-supp | |
ISO_8859-supp, iso-ir-154, latin1-2-5, csISO8859Supp | |
IT | |
IT, iso-ir-15, ISO646-IT, csISO15Italian | |
Japanese (EUC) | 51932 |
x-euc, x-euc-jp, CP51932, MS51932, WINDOWS-51932 | |
Japanese (JIS 0208-1990 and 0121-1990) | 20932 |
EUC-JP, Extended_UNIX_Code_Packed_Format_for_Japanese, csEUCPkdFmtJapanese | |
Japanese (JIS) | 50220 |
iso-2022-jp | |
Japanese (JIS-Allow 1 byte Kana - SO/SI) | 50222 |
_iso-2022-jp$SIO | |
Japanese (JIS-Allow 1 byte Kana) | 50221 |
csISO2022JP, _iso-2022-jp, CP50221, ISO-2022-JP-MS, ISO2022-JP-MS, MS50221, WINDOWS-50221 | |
Japanese (Mac) | 10001 |
x-mac-japanese | |
Japanese (Shift-JIS) | 932 |
shift_jis, csShiftJIS, csWindows31J, ms_Kanji, shift-jis, x-ms-cp932, x-sjis, sjis, CP932, MS932, SHIFFT_JIS, SHIFFT_JIS-MS, SJIS-MS, SJIS-OPEN, SJIS-WIN, WINDOWS-932, Windows-31J | |
JIS_C6220-1969-jp | |
JIS_C6220-1969-jp, JIS_C6220-1969, iso-ir-13, katakana, x0201-7, csISO13JISC6220jp | |
JIS_C6220-1969-ro | |
JIS_C6220-1969-ro, iso-ir-14, jp, ISO646-JP, csISO14JISC6220ro | |
JIS_C6226-1978 | |
JIS_C6226-1978, iso-ir-42, csISO42JISC62261978 | |
JIS_C6226-1983 | |
JIS_C6226-1983, iso-ir-87, x0208, JIS_X0208-1983, csISO87JISX0208 | |
JIS_C6229-1984-a | |
JIS_C6229-1984-a, iso-ir-91, jp-ocr-a, csISO91JISC62291984a | |
JIS_C6229-1984-b | |
JIS_C6229-1984-b, iso-ir-92, ISO646-JP-OCR-B, jp-ocr-b, csISO92JISC62991984b | |
JIS_C6229-1984-b-add | |
JIS_C6229-1984-b-add, iso-ir-93, jp-ocr-b-add, csISO93JIS62291984badd | |
JIS_C6229-1984-hand | |
JIS_C6229-1984-hand, iso-ir-94, jp-ocr-hand, csISO94JIS62291984hand | |
JIS_C6229-1984-hand-add | |
JIS_C6229-1984-hand-add, iso-ir-95, jp-ocr-hand-add, csISO95JIS62291984handadd | |
JIS_C6229-1984-kana | |
JIS_C6229-1984-kana, iso-ir-96, csISO96JISC62291984kana | |
JIS_Encoding | |
JIS_Encoding, csJISEncoding | |
JIS_X0201 | |
JIS_X0201, X0201, csHalfWidthKatakana | |
JIS_X0212-1990 | |
JIS_X0212-1990, x0212, iso-ir-159, csISO159JISX02121990 | |
JUS_I.B1.002 | |
JUS_I.B1.002, iso-ir-141, ISO646-YU, js, yu, csISO141JUSIB1002 | |
JUS_I.B1.003-mac | |
JUS_I.B1.003-mac, macedonian, iso-ir-147, csISO147Macedonian | |
JUS_I.B1.003-serb | |
JUS_I.B1.003-serb, iso-ir-146, serbian, csISO146Serbian | |
KOI7-switched | |
KOI7-switched | |
Korean | 949 |
ks_c_5601-1987, csKSC56011987, iso-ir-149, korean, ks_c_5601, ks_c_5601_1987, ks_c_5601-1989, KSC_5601, KSC5601, ks-c-5601, ks-c5601, CP949, UHC | |
Korean (EUC) | 51949 |
euc-kr, csEUCKR | |
Korean (ISO) | 50225 |
iso-2022-kr, csISO2022KR, iso2022-kr | |
Korean (Johab) | 1361 |
Johab, CP1361 | |
Korean (Mac) | 10003 |
x-mac-korean | |
Korean Wansung | 20949 |
x-cp20949 | |
KSC5636 | |
KSC5636, ISO646-KR, csKSC5636 | |
KZ-1048 | |
KZ-1048, STRK1048-2002, RK1048, csKZ1048 | |
Latin 3 (ISO) | 28593 |
iso-8859-3, Latin3, ISO_8859-3, ISO_8859-3:1988, iso-ir-109, l3, csISOLatin3, iso8859-3 | |
Latin 9 (ISO) | 28605 |
iso-8859-15, Latin9, ISO_8859-15, l9, Latin-9, iso8859-15 | |
latin-greek | |
latin-greek, iso-ir-19, csISO19LatinGreek | |
Latin-greek-1 | |
Latin-greek-1, iso-ir-27, csISO27LatinGreek1 | |
latin-lap | |
latin-lap, lap, iso-ir-158, csISO158Lap | |
Microsoft-Publishing | |
Microsoft-Publishing, csMicrosoftPublishing | |
MNEM | |
MNEM, csMnem | |
MNEMONIC | |
MNEMONIC, csMnemonic | |
MSZ_7795.3 | |
MSZ_7795.3, iso-ir-86, ISO646-HU, hu, csISO86Hungarian | |
NATS-DANO | |
NATS-DANO, iso-ir-9-1, csNATSDANO | |
NATS-DANO-ADD | |
NATS-DANO-ADD, iso-ir-9-2, csNATSDANOADD | |
NATS-SEFI | |
NATS-SEFI, iso-ir-8-1, csNATSSEFI | |
NATS-SEFI-ADD | |
NATS-SEFI-ADD, iso-ir-8-2, csNATSSEFIADD | |
NC_NC00-10:81 | |
NC_NC00-10:81, cuba, iso-ir-151, ISO646-CU, csISO151Cuba | |
NF_Z_62-010 | |
NF_Z_62-010, iso-ir-69, ISO646-FR, fr, csISO69French | |
NF_Z_62-010_ | |
NF_Z_62-010_, iso-ir-25, ISO646-FR1, csISO25French | |
Nordic (DOS) | 865 |
IBM865, cp865, 865, csIBM865 | |
Norwegian (IA5) | 20108 |
x-IA5-Norwegian | |
NS_4551-1 | |
NS_4551-1, iso-ir-60, ISO646-NO, no, csISO60DanishNorwegian, csISO60Norwegian1 | |
NS_4551-2 | |
NS_4551-2, ISO646-NO2, iso-ir-61, no2, csISO61Norwegian2 | |
OEM Cyrillic (primarily Russian) | 855 |
IBM855, cp855, 855, csIBM855 | |
OEM Multilingual Latin 1 + Euro symbol | 858 |
IBM00858, CCSID00858, CP00858, PC-Multilingual-850+euro, CP858 | |
OEM United States | 437 |
IBM437, 437, cp437, csPC8, CodePage437, csPC8CodePage437 | |
OSD_EBCDIC_DF03_IRV | |
OSD_EBCDIC_DF03_IRV | |
OSD_EBCDIC_DF04_1 | |
OSD_EBCDIC_DF04_1 | |
OSD_EBCDIC_DF04_15 | |
OSD_EBCDIC_DF04_15 | |
PC8-Danish-Norwegian | |
PC8-Danish-Norwegian, csPC8DanishNorwegian | |
PC8-Turkish | |
PC8-Turkish, csPC8Turkish | |
Portuguese (DOS) | 860 |
IBM860, cp860, 860, csIBM860 | |
PT | |
PT, iso-ir-16, ISO646-PT, csISO16Portuguese | |
PT2 | |
PT2, iso-ir-84, ISO646-PT2, csISO84Portuguese2 | |
PTCP154 | 154 |
PTCP154, csPTCP154, PT154, CP154, Cyrillic-Asian | |
Romanian (Mac) | 10010 |
x-mac-romanian | |
SCSU | |
SCSU | |
SEN_850200_B | |
SEN_850200_B, iso-ir-10, FI, ISO646-FI, ISO646-SE, se, csISO10Swedish | |
SEN_850200_C | |
SEN_850200_C, iso-ir-11, ISO646-SE2, se2, csISO11SwedishForNames | |
Swedish (IA5) | 20107 |
x-IA5-Swedish | |
T.101-G2 | |
T.101-G2, iso-ir-128, csISO128T101G2 | |
T.61 | 20261 |
x-cp20261 | |
T.61-7bit | |
T.61-7bit, iso-ir-102, csISO102T617bit | |
T.61-8bit | |
T.61-8bit, T.61, iso-ir-103, csISO103T618bit | |
TCA Taiwan | 20001 |
x-cp20001 | |
TeleText Taiwan | 20004 |
x-cp20004 | |
Thai (Mac) | 10021 |
x-mac-thai | |
Thai (Windows) | 874 |
windows-874, DOS-874, iso-8859-11, TIS-620, CP874 | |
TSCII | |
TSCII, csTSCII | |
Turkish (DOS) | 857 |
ibm857, cp857, 857, csIBM857 | |
Turkish (ISO) | 28599 |
iso-8859-9, Latin5, ISO_8859-9, ISO_8859-9:1989, iso-ir-148, l5, iso8859-9 | |
Turkish (Mac) | 10081 |
x-mac-turkish | |
Turkish (Windows) | 1254 |
windows-1254, CP1254, MS-TURK | |
Ukrainian (Mac) | 10017 |
x-mac-ukrainian | |
Unicode | 1200 |
unicode, utf-16, CP1200, UTF16LE, UCS-2LE, UTF16, UCS-2, UTF-16LE | |
Unicode (Big-Endian) | 1201 |
unicodeFFFE, CP1201, UTF16BE, UCS-2BE, UTF-16BE | |
Unicode (UTF-7) | 65000 |
utf-7, csUnicode11UTF7, unicode-1-1-utf-7, x-unicode-2-0-utf-7 | |
Unicode (UTF-8) | 65001 |
utf-8, unicode-1-1-utf-8, unicode-2-0-utf-8, x-unicode-2-0-utf-8, CP65001, UTF8 | |
UNICODE-1-1 | |
UNICODE-1-1, csUnicode11 | |
UNKNOWN-8BIT | |
UNKNOWN-8BIT, csUnknown8BiT | |
US-ASCII | 20127 |
us-ascii, ANSI_X3.4-1968, ANSI_X3.4-1986, ascii, cp367, csASCII, IBM367, ISO_646.irv:1991, ISO646-US, iso-ir-6us, iso-ir-6, us | |
us-dk | |
us-dk, csUSDK | |
UTF-32 | 12000 |
UTF-32, UTF-32LE, CP12000, UTF32LE, UTF32 | |
UTF-32BE | 12001 |
UTF-32BE, CP12001, UTF32BE | |
Ventura-International | |
Ventura-International, csVenturaInternational | |
Ventura-Math | |
Ventura-Math, csVenturaMath | |
Ventura-US | |
Ventura-US, csVenturaUS | |
videotex-suppl | |
videotex-suppl, iso-ir-70, csISO70VideotexSupp1 | |
Vietnamese (Windows) | 1258 |
windows-1258, CP1258 | |
VIQR | |
VIQR, csVIQR | |
VISCII | |
VISCII, csVISCII | |
Wang Taiwan | 20005 |
x-cp20005 | |
Western European (DOS) | 850 |
ibm850, cp850, 850, csPC850Multilingual | |
Western European (IA5) | 20105 |
x-IA5 | |
Western European (ISO) | 28591 |
iso-8859-1, cp819, Latin1, ibm819, iso_8859-1, iso_8859-1:1987, iso8859-1, iso-ir-100, l1, csISOLatin1 | |
Western European (Mac) | 10000 |
macintosh, mac, csMacintosh | |
Western European (Windows) | 1252 |
Windows-1252, x-ansi, CP1252, MS-ANSI |
Everyone is encouraged to use Unicode (especially UTF-8), however the reality is that many of these non-Unicode encodings are in broad use and we still need to standardize the way we identify them.
I will be posting the full XML data set for this list and my firstobject XML editor foal script that built it at a later time.
See also:
Convert ANSI file to Unicode
ANSI and Unicode files and C++ strings
UTF-8 Files and the Preamble
Setting the XML Declaration With CMarkup
CMarkup GetDeclaredEncoding Method
UTF-16 Files and the Byte Order Mark (BOM)