Encoding Text Font Character Sets

 

- Info:

- Single font can support one or more "Character Sets".

For instance it can support "Unicode Character Set" and any number of "One Byte Character Sets".

- In order to display file written using "Unicode Character Set" you need "Unicode Font".

"Unicode Font" is font which supports "Unicode Character Set", it knows how to display unicode characters.

"Unicode font" doesn't have to support all unicode characters.

For instance, unicode fonts Arial and "Times New Roman" contain 1,419 characters and 1,674 glyphs.

On the other hand, unicode font "Arial Unicode MS", contains 38,917 characters and 50,377 glyphs.

- Each "Unicode Character" is identified with it's "Code Point". Here are some examples: =U+0160, =U+010C.

- Text editor which wants to display text file does following.

- Text editor first looks at few bytes at the begining of the file to see if they represent "Byte Order Mark" called BOM.

- Byte Order Mark defines which encoding was used to store unicode characters into memory:

Byte Order Mark Encoding File with single letter

EF BB BF UTF-8 EF BB BF C5 A0

FF FE UTF-16, little endian FF FE 60 01

FE FF UTF-16, big endian FE FF 01 60 (high byte first)

FF FE 00 00 UTF-32, little endian FF FE 00 00 60 01 00 00

00 00 FE FF UTF-32, big-endian 00 00 FE FF 00 00 01 60

- If BOM was found, text editor knows that file was written using "Unicode Character Set" encoded as defined by BOM.

- For each letter in the file, text editor can now calculate it's "Code Point".

- "Code Point" is sent to the "Unicode Font" which displays that "Unicode Character" by returning one or more glyphs.

For example, some font can have glyph that looks exactly like letter and will use that glyph to display letter .

Some other font might not have glyph looking like , but might have glyphs S and which can be combined as .

- Text editor needs to tell the Font which "Character Set" to use and "Character Code" of wanted character.

- This means that for unicode characters, text editor needs to tell the Font to use "Unicode Character Set" and "Code

Point" of wanted character.

- Use following procedure to see which "Character Sets" are supported by the font and which "Code Points" are

supported, if font supports "Unicode Character Set":

− Start − Programs − Accessories − System Tools − Character Map − Advacned View