David J. Perry
Rye High School, Rye, NY
Member, Educational
Computer Applications Committee,
American Classical League
Version 3, 8/31/03 (updated 9/12/03)
Version 2, 11/10/02
Version 1, 11/25/01 (original draft 5/8/01)
Overview
This page is designed to help those who are new to using Unicode characters located in the supplementary planes. It presents some basic concepts that are needed to understand how to work with these characters and then provides step by step instructions for Mac OS X and for Windows.
This document contains five parts:
1. Background Information about Plane 1
2. Plane 1 characters under Mac OS
3. Plane 1 characters under Windows
4. Plane 1 characters on a web page
5. Conversion information and tables of characters
In addition, I have posted a PDF file that explain how to add supplementary characters to a TrueType font. Font developers can find it through this link.
Part 1. Background Information about Plane 1
Until version 3.1 of Unicode was released, all characters were stored in the Basic Multilingual Plane (BMP), the original Unicode codespace with room for about 65,000 characters. However, the BMP is almost filled up. Therefore, beginning with version 3.1, Unicode has begun to allocate characters to additional groups of 65,000 characters, referred to as supplementary planes. The BMP is counted as Plane 0. Plane 1 (also known as the Supplementary Multilingual Plane, or SMP) will mainly be used for historical scripts as well as sets of Western and Byzantine musical symbols. As of Unicode 4.0, scripts assigned to the SMP include Gothic, Old Italic, Linear B, Cypriot syllabary, Aegean numbers, and Ugaritic cuneiform. Unicode has also allocated a very large group of Asian ideographs, less commonly used than those in the BMP, to Plane 2, the Supplementary Ideographic Plane (SIP).
Note: in this paper I have often used the phrase “Plane 1 characters.” I find this shorter and easier than saying “supplementary plane characters,” and the characters that I personally work with are in Plane 1. However, the ideographs in Plane 2 function the same way.
A number of additional characters of interest to classicists will be located in the Plane 1. The Thesaurus Linguae Graecae has submitted proposals for ancient Greek musical notation, numerical characters, and acrophonic numerals. These have been approved by the Unicode Technical Committee and will be included in future versions of Unicode; for details see the TLG’s web site. It is also possible that some medieval characters will be placed in Plane 1. See the website of the Medieval Unicode Font Initiative for details on this project.
Characters in the supplementary planes are different than characters in the BMP because they are stored in a Unicode font under one hexadecimal number, but in many applications are accessed through the use of surrogate pairs. The designers of Unicode, anticipating that more than the original 65,000 characters would be needed, devised a mechanism to provide access to 15 additional planes of 65,000 characters by reserving two blocks of codepoints in the BMP, the high surrogates area and the low surrogates area. An application can combine a high surrogate and low surrogate together to point to a value in one of the upper planes. The reasons why some applications require a surrogate pair and some do not are highly technical and beyond the scope of this paper.
Part 2. Plane 1 Characters under Mac OS
Operating System
You must have OS X. Plane 1 characters will display using OS 10.1, and do not work with 10.0; I have not been able to test 10.0.1 through 10.0.4. OS 10.2 has improved font support throughout, and in particular it comes with the Character Palette applet that will display any Unicode character, including those in Plane 1. I doubt very much that Plane 1 characters would display under OS 9; but in any case I have never seen a font to test them with.
Fonts
Locate a font with the characters you need; there aren't many at present. The Code 2001 font by James Kass includes the Old Italic, Gothic, and Deseret characters in Unicode 3.1 plus the Old Persian Cuneiform glyphs proposed for Unicode 3.2. See his page at http://home.att.net/~jameskass/code2001.htm . Note that this is was originally developed as a Windows TrueType font but will also work under OS X. Updated versions of Athena and Cardo with Plane 1 characters of interest to classicists will be available from my web page http://scholarsfonts.net . Install the font(s) you have obtained by dragging them to the Library/Fonts folder.
Choosing an Input Method
The easiest way to enter Plane 1 characters with OS 10.2 is to use the Character Palette. It can be accessed via the Extras pulldown menu at the bottom of the Fonts dialog box. You can also access it directly, without going into Fonts, as follows:
If you already have a keyboard menu visible in the Finder (a flag symbol to the right of Help on the menu bar):
1. Pull down the keyboard menu and see if “Show Character Palette” is there.
2. If Character Palette is not available, choose “Customize Menu” and a dialog box will open. Click to place a check mark next to Character Palette, the first item in the list, then close the dialog.
If you do not have a keyboard menu visible in the Finder:
1. From the Apple menu, choose System Preferences, then double-click the International icon (a UN flag)
2. Click on the Input Menu tab (right-hand one)
3. Click to place a check mark next to Character Palette, the first item in the list, then close the dialog. A keyboard menu will now be visible in the Finder.
If you are still using OS 10.1, or if you wish to use the hex input method in addition to the Character Palette in 10.2, you need to install the Unicode Hex Input method. To install this IM:
1. from the Apple menu, open System Preferences and double-click the International icon
2. choose the Keyboard Menu tab at the right
3. scroll down toward the bottom and select Unicode Hex Input by clicking in the checkbox
4. close the window
5. a keyboard icon will now appear on the toolbar; you can select the input method you want here, or use command-spacebar to switch among available scripts. (Note that OS X considers Unicode a “script” in the same way that Chinese is a different script than Cyrillic or Roman; you can have more than one input method for it (e.g., Hex Input in addition to the Extended Roman Unicode keyboard).
To use the hex input method, you also must have the surrogate pair numbers for the characters you need. (This is not required to use the Character Palette.) For instance, the character old italic letter a is U+10300; the two surrogate pairs that may be used to access it are D800 and DF00 (hexadecimal). The formula for converting a Unicode scalar value (single hex number) to a pair of surrogates is given below in Part 5 of this document.
Entering the characters
To use Character Palette (OS 10.2 or above):
1. Start TextEdit, the basic editor that comes with OS X
2. From the Finder menu bar, open the Keyboard menu (a flag symbol)
3. Choose “Show Character Palette”
4. Click on the relevant Unicode range in the list at the left.
5. Locate the character you want in the chart. Clicking the triangle will let you set the font you want and provide additional information about the character you have highlighted.
6. Double-click on the character in the chart and it will be pasted into your document.
To use the Unicode Hex Input method:
1. Start TextEdit, the basic editor that comes with OS X
2. Turn on NumLock on the keypad at the right
3. Select the Unicode Hex Input from the keyboard icon on the menu bar, or use command-spacebar to switch to the Unicode “script”
4. Hold down the option key and enter the high surrogate value in hex (e.g., D800), and release. You do not need to hold down the shift key; typing option-d800 is equivalent to the hex value D800.
5. Hold down option again and enter the low surrogate in hex (e.g., DF00) and your character should appear. If it does not, make sure that you are using the correct font. If the Mac cannot find the characters in the font you are using, it will display a pair of icons which represent the surrogate pair.
It should also be noted that it is possible under OS 10.2 to create customized keyboard layouts in XML format. I don’t think anyone has done this yet for the Plane 1 characters, but it could be done. See the details at http://developer.apple.com/technotes/tn2002/tn2056.html .
At the moment, TextEdit and SUE are the only editors I know of that can support surrogates.
Thanks to Tom Gewecke for help with the earlier version of this Mac OS information.
Part 3. Plane 1 Characters under Windows
You must have Windows XP, Windows 2000 or Windows NT. Plane 1 characters will not display properly in Windows 98/Me (although if you open a file that contains Plane 1 characters under Win98/Me, it will not be damaged; the Plane 1 characters won’t be visible, but they will still be there if you open the file later under Win2000 or XP).
Microsoft has always claimed that Windows 98 does not support supplementary characters. Some recent experiments reported on the Unicode mailing list indicate that one can in fact display supplementary characters by editing the Registry as described in below. I have not personally done this, but the adventurous are welcome to try it.
Under Windows 2000, it may be necessary to take the preliminary step of enabling support for supplementary characters; by default, it is turned off. (It is on by default in Windows XP.) However, support for supplementary characters may have been turned on if you have made certain changes to your system (for example, enabling languages such Hebrew or Arabic or Indic languages). You must edit the Registry to turn this feature on, and messing with the Registry can be dangerous. I therefore strongly suggest that you try entering some supplementary characters as described below. If they don’t work, then come back here and follow the directions below to change the Registry settings. If you don’t know what you are doing, get help from someone who does. At the very least, after you start RegEdit, open the Help file and print out the page that describes how to restore the Registry if something goes wrong.
You must add two keys to the registry. To do this use the Registry Editor (RegEdit) that comes with Windows. Choose Start / Run, type regedit in the box, and the Registry Editor will start. For exact instructions, see the Microsoft Developer page at http://msdn.microsoft.com/library/psdk/winbase/unicode_192r.htm .
You can also look at the excellent page by Tex Texin at http://www.i18nguy.com/surrogates.html . This page tells you how to make the one necessary change to the Registry and also provides information on two additional Registry values that are useful when working with Plane 1 characters. It also contains additional information about supplementary characters and some links.
Fonts
Locate a font with the characters you need. There aren't many at present, and there is no point in going through all these steps if you have no font. The Code 2001 font by James Kass, specifically designed to support supplementary characters, is the best place to start. See his page at http://home.att.net/~jameskass/code2001.htm . An updated version of Cardo with Plane 1 characters of interest to classicists will be available from my web page http://scholarsfonts.net . Juan-José Marcos is also in the process of adding Plane 1 characters to his Alphabetum font. Install the font(s) you have obtained by dragging them to the Windows/Fonts folder, or by choosing Start/Settings/Control Panel/Fonts; File/Install New Font and navigating to the directory where you stored the font.
Choosing an Input Method
You must have one of the following:
o UniPad (http://www.sharmahd.com/unipad/) is a text editor specifically designed to work with Unicode; versions .95 and later support surrogates and the use of codepoints in Plane1 and above. Since UniPad is a plain text editor, you would need to edit the file in another application after entering the characters if you wished to use different font sizes, bold, italic, etc.
o Versions 5 and above of Keyman can create keyboards using Plane 1 characters. See http://www.tavultesoft.com/keyman . I have successfully created a Keyman keyboard for the Old Italic characters.
o the Microsoft Keyboard Layout Creator utility can create keyboards that utilize supplementary characters. It is freely available from http://www.microsoft.com/globaldev/tools/msklc.mspx . Note that you will need to install the Microsoft .NET framework in order for this program to run.
OR
If you have a keyboard or IM, start and use it per the directions that came with it.
If you don't have a keyboard or IM, enter characters as follows:
Because you are entering a pair of surrogate values, you will notice that the cursor will advance after you type the first one, but will display only white space; after you type second, the white space will vanish and the correct character will appear. If you use the Backspace key to remove a Plane 1 character, you will need to type it twice.
Microsoft Word 2000 does not have support for supplementary characters. Nor does the Windows Character Map in XP.
OpenOffice Writer 1.0.3 supports supplementary characters (I have not tested earlier versions). Unlike Word or WordPad, however, the ALT-x method of entering characters does not work. Nor does OpenOffice come with its own method for entering characters above the BMP; its Insert / Special Character dialog supports only the BMP. So you can either enter the text in WordPad and paste it into OpenOffice, or use a keyboard built with Keyman or the Keyboard Layout Creator. Note that OpenOffice is an open-source project, available for downloading from http://www.openoffice.org/ .
The information on this page was gleaned from several sources. Several people on the Unicode mailing list were very helpful, particularly Tex Texin.
Part 4. Plane 1 Characters on a Web Page
Note 1: the following information is taken from a thread on the Unicode mailing list. Thanks to all those who contributed items to the discussion; nothing in this part is original with me.
Note 2: the following discussion assumes you know how to construct web pages; it provides only information specific to getting characters in Planes 1 and above to work.
Getting various browsers to display anything outside the BMP is a tricky thing. The following seem to be true as of November 2002. For any web page to display non-BMP characters properly, the user must have an appropriate font his or her system, normally the same font specified by the web page.
Microsoft Internet Explorer (Windows)
o use numeric character references (NCRs), either decimal or hex, instead of the standard UTF-8
o set the encoding for the page to “x-user-defined” rather than “UTF-8”; sometimes it helps if users manually set the encoding to “User-defined” in their browser
Netscape
Netscape does not yet support supplementary characters..
Opera
Opera 6 supports characters outside the BMP.
A sample of Plane 1 to try
Here is a web page from Tex Texin that displays a sample of Etruscan (Plane 1):
http://www.i18nguy.com/unicode-example-plane1.html
Part 5. Conversion Information and Table of Characters
Here’s the formula I mentioned above. You will need this if you know the Unicode scalar values of the characters you need and want to enter them in in WordPad on Windows 2000 (with XP, you can enter the scalar value directly followed by Alt-x) or TextEdit on Mac OS X by typing the two surrogate values. First convert the single Unicode value to its surrogate pair in hexadecimal, then, if you are using WordPad, convert the two hex numbers to decimal so you can type them on the keypad.
To convert a Plane 1 sequence (S) to a pair of high and low surrogates (H, L):
H = (S–1000016) / 40016 + D80016
L = (S–1000016) % 40016 + DC0016
(from The Unicode Standard 3.0, §3.7, page 45)
All this math must be done in hexadecimal. The % character represents the Modulo operator; the calculator applet that comes with Windows, when run in in scientific mode, can do this as well as other hex math.
Rather than doing the math yourself, you can use the very convenient calculator by Michael Kaplan at
http://www.trigeminal.com/16to32AndBack.asp
Note: I have converted the scalar values to the two hex values as carefully as possible, but I do not guarantee 100% accuracy. Let me know of any errors.
H = high surrogate; L = low surrogate; S = Unicode scalar value (hexadecimal)
OLD ITALIC
H L S Name
D800 DF00 10300 OLD ITALIC LETTER A
D800 DF01 10301 OLD ITALIC LETTER BE
D800 DF02 10302 OLD ITALIC LETTER KE
D800 DF03 10303 OLD ITALIC LETTER DE
D800 DF04 10304 OLD ITALIC LETTER E
D800 DF05 10305 OLD ITALIC LETTER VE
D800 DF06 10306 OLD ITALIC LETTER ZE
D800 DF07 10307 OLD ITALIC LETTER HE
D800 DF08 10308 OLD ITALIC LETTER THE
D800 DF09 10309 OLD ITALIC LETTER I
D800 DF0A 1030A OLD ITALIC LETTER KA
D800 DF0B 1030B OLD ITALIC LETTER EL
D800 DF0C 1030C OLD ITALIC LETTER EM
D800 DF0D 1030D OLD ITALIC LETTER EN
D800 DF0E 1030E OLD ITALIC LETTER ESH
D800 DF0F 1030F OLD ITALIC LETTER O
D800 DF10 10310 OLD ITALIC LETTER PE
D800 DF11 10311 OLD ITALIC LETTER SHE
D800 DF12 10312 OLD ITALIC LETTER KU
D800 DF13 10313 OLD ITALIC LETTER ER
D800 DF14 10314 OLD ITALIC LETTER ES
D800 DF15 10315 OLD ITALIC LETTER TE
D800 DF16 10316 OLD ITALIC LETTER U
D800 DF17 10317 OLD ITALIC LETTER EKS
D800 DF18 10318 OLD ITALIC LETTER PHE
D800 DF19 10319 OLD ITALIC LETTER KHE
D800 DF1A 1031A OLD ITALIC LETTER EF
D800 DF1B 1031B OLD ITALIC LETTER ERS
D800 DF1C 1031C OLD ITALIC LETTER CHE
D800 DF1D 1031D OLD ITALIC LETTER II
D800 DF1E 1031E OLD ITALIC LETTER UU
1031F <reserved>
D800 DF20 10320 OLD ITALIC NUMERAL ONE
D800 DF21 10321 OLD ITALIC NUMERAL FIVE
D800 DF22 10322 OLD ITALIC NUMERAL TEN
D800 DF23 10323 OLD ITALIC NUMERAL FIFTY
(continued)
GOTHIC
H L S Name
D800 DF30 10330 GOTHIC LETTER AHSA
D800 DF31 10331 GOTHIC LETTER BAIRKAN
D800 DF32 10332 GOTHIC LETTER GIBA
D800 DF33 10333 GOTHIC LETTER DAGS
D800 DF34 10334 GOTHIC LETTER AIHVUS
D800 DF35 10335 GOTHIC LETTER QAIRTHRA
D800 DF36 10336 GOTHIC LETTER IUJA
D800 DF37 10337 GOTHIC LETTER HAGL
D800 DF38 10338 GOTHIC LETTER THIUTH
D800 DF39 10339 GOTHIC LETTER EIS
D800 DF3A 1033A GOTHIC LETTER KUSMA
D800 DF3B 1033B GOTHIC LETTER LAGUS
D800 DF3C 1033C GOTHIC LETTER MANNA
D800 DF3D 1033D GOTHIC LETTER NAUTHS
D800 DF3E 1033E GOTHIC LETTER JER
D800 DF3F 1033F GOTHIC LETTER URUS
D800 DF40 10340 GOTHIC LETTER PAIRTHRA
D800 DF41 10341 GOTHIC LETTER NINETY
D800 DF42 10342 GOTHIC LETTER RAIDA
D800 DF43 10343 GOTHIC LETTER SAUIL
D800 DF44 10344 GOTHIC LETTER TEIWS
D800 DF45 10345 GOTHIC LETTER WINJA
D800 DF46 10346 GOTHIC LETTER FAIHU
D800 DF47 10347 GOTHIC LETTER IGGWS
D800 DF48 10348 GOTHIC LETTER HWAIR
D800 DF49 10349 GOTHIC LETTER OTHAL
D800 DF4A 1034A GOTHIC LETTER NINE HUNDRED