Code Pages

A code page is a table storing a character set that supports one or more language scripts. When you press any key on a keyboard, the computer receives a numeric code that represents that keystroke. Code pages store these numeric codes. Many personal computer operating systems support multiple code pages and allow you to switch among them.

For example, DOS uses code page 437 for several languages that use Roman alphabetical characters, including English, French, and German, but requires code page 850 for Portuguese. DOS code page 850 (Portuguese) removes the symbol for f (franc) and inserts an O (acute). Different computer operating systems use different code pages for the same language. For example, DOS uses code page 437 for English, but Windows* 95 uses code page 1252.

In a single-byte code page, up to 256 codes are available to represent lower- and uppercase letters, numbers, punctuation marks, and all the mathematical symbols on your keyboard.

However, 256 codes are not sufficient to represent all the letters and characters used in the writing systems of every language. Some nonalphabetic writing systems, such as Chinese, Japanese, and Korean, contain thousands of characters and require a double-byte code page.

Differences between single-byte and double-byte code pages usually cause display and readability problems. For example, a document created with Windows 95 in Japan is probably created with code page 932. The same document will not look the same when displayed on a Windows 95 computer using code page 1252 in the United States. Unrecognized characters will be replaced with a symbol such as a heart. In the past, these substituted characters might have caused a database such as Novell eDirectory to fail to recognize objects.

To help resolve these problems, a convention called Unicode* has been adopted.


Using Unicode

Unicode is a 16-bit character representation, defined by the Unicode Consortium, that supports up to 65,536 unique characters. Unicode allows the characters for multiple languages to be represented using a single Unicode representation.

Any character that your code page does not understand is substituted in your display by the 4-digit hexadecimal value of the Unicode character, surrounded by square brackets, for example: [00FF]

Because eDirectory supports Unicode, substituted characters do not prevent eDirectory from recognizing an object. For example, your company's European office might create an Organizational Unit object to represent Finance in western Europe. They might use DOS code page 852 to make the generic currency symbol a part of the object name (OU=[curren]W-Euro).

When this object is accessed in the United States, using DOS code page 437 or Windows 95 code page 1252, the currency symbol ([curren]) is replaced by square brackets surrounding the Unicode number for the currency symbol, [00A4]. eDirectory recognizes the Unicode number, so the object can still be opened and accessed.

However, the object name (containing the square brackets and unicode number) will be difficult for users to understand. If the name is too difficult to interpret, the only solution is to determine which code page was used to create the object and then view the object using that code page. Changing code pages can be troublesome; see Changing Code Pages for guidelines.

The following table shows ranges of Unicode numbers, with a description of each range and a list of code pages that might be used to view the character correctly. However, switching to one of the suggested code pages does not guarantee that you will see the correct results. For example, characters in the range 4E00-9FFF (Han Ideographs) are used in Japan, China, and Korea. But switching to code page 932 (Japanese) does not display the character correctly if the character is used only in China.

The most reliable way to determine the character is to refer to the Unicode Standard, Version 2.0. Access the Unicode Web site for more information. The Web site also includes charts of Unicode characters.


Table 1. Unicode Ranges, Descriptions, and Code Pages

Unicode Range Description Geographical Region Windows Code Pages DOS Code Pages

0080 - 00FF

Extended Latin

Western Europe

 

437, 850,860, 863, 865

0100 - 01FF

Extended Latin

Central Europe

1250, 1257

852, 775

0300 - 03FF

Greek

Greece

1253

737

0400 - 04FF

Cyrillic

Russia

1251

855, 866

0590 - 05FF

Hebrew

Israel

1255

862

0600 - 06FF

Arabic

Middle East

1256

864

2500 - 26FF

Line Drawing and Graphics

N/A

N/A

Most DOS code pages

4E00 - 95FF

Han Ideographs

Far East

932, 936, 949, 950

932, 936, 949, 950

AC00 - D7FF

Hangul Syllables

Korea

949

949

FE70 - FEFF

Arabic Presentation Forms

Middle East

N/A

864

FF00 - FFEF

Full- and Half-Width Variants

Far East

932, 936, 949, 950

932, 936, 949, 950



Previous | Next