· Encoding – Text – Font
- Single font can support one or more "Character Sets".
For instance it can support "Unicode Character Set" and
any number of "One Byte Character Sets".
- In order to display file written using "Unicode Character
Set" you need "Unicode Font".
"Unicode Font" is font which supports "Unicode
Character Set", it knows how to display unicode characters.
"Unicode font" doesn't have to support all unicode
For instance, unicode fonts Arial and "Times New Roman"
contain 1,419 characters and 1,674 glyphs.
On the other hand, unicode font "Arial Unicode MS",
contains 38,917 characters and 50,377 glyphs.
- Each "Unicode Character" is identified with it's "Code
Point". Here are some examples: Š=U+0160, È=U+010C.
editor which wants to display text file does following.
editor first looks at few bytes at the begining of the file to see if they
represent "Byte Order Mark" called BOM.
Order Mark defines which encoding was used to store unicode characters into
Order Mark Encoding File with single letter Š
EF BB BF UTF-8
EF BB BF C5 A0
FF FE UTF-16, little endian FF FE 60 01
FE FF UTF-16, big endian FE FF 01 60
FF FE 00 00 UTF-32, little endian FF FE 00 00 60 01 00 00
00 00 FE FF UTF-32, big-endian 00 00 FE FF 00 00 01 60
- If BOM was found, text editor knows that file was written using
"Unicode Character Set" encoded as defined by BOM.
- For each letter in the file, text editor can now calculate it's
- "Code Point" is sent to the "Unicode Font" which
displays that "Unicode Character" by returning one or more glyphs.
For example, some font can have glyph that looks exactly like letter
Š and will use that glyph to display letter Š.
Some other font might not have glyph looking like Š, but
might have glyphs S and ¡ which can be
combined as Š .
editor needs to tell the Font which "Character Set" to use and
"Character Code" of wanted character.
means that for unicode characters, text editor needs to tell the Font to use
"Unicode Character Set" and "Code
of wanted character.
- Use following procedure to see which "Character Sets" are
supported by the font and which "Code Points" are
supported, if font supports "Unicode Character Set":
− Start − Programs − Accessories − System
Tools − Character Map − Advacned View