Numeric Character References

CMarkup release 6.5 adds support for decoding numeric character references encountered in a document. This is a way of specifying a character by using its Unicode value written out as a number. The numeric character references begin with &# and contain either a regular number (base 10) or a hexidecimal number prefixed with x, followed by a semi-colon. They are used mainly for whitespace and occasionally the 5 Standard Special Characters shown in the Standard column below. The standard encoding form is shown here also for those 5 characters because it is recommended over numeric referencing.

Character Base 10 Hex Standard
space    
tab 	 	
carriage return  
line feed 
 

< less than < < <
> greater than > > >
& ampersand & & &
' apostrophe or single quote ' ' '
" double quote " " "

	Character	Base 10	Hex	Standard
	space
	tab
	carriage return
	line feed
<	less than	<	<	<
>	greater than	>	>	>
&	ampersand	&	&	&
'	apostrophe or single quote	'	'	'
"	double quote	"	"	"

Numeric character references are also used to specify Unicode characters such as the Chinese character 中. This is a way of manually typing Unicode characters into an XML document with an editor that does not support those characters. When you retrieve text from the document using CMarkup, it will decode these character references into the literal characters they refer to. The following table shows some sample Chinese characters (the characters themselves will be visible here if your browser font supports them) with meanings provided by chineselanguage.org.

Chinese Character Base 10 Hex
中 middle 中 中
国 nation 国 国
人 human being 人 人