Using Plane 1 Characters

 

David J. Perry

Rye High School, Rye, NY

Member, Educational Computer Applications Committee,
American Classical League

 

 

Version 3, 8/31/03 (updated 9/12/03)

Version 2, 11/10/02

Version 1, 11/25/01 (original draft 5/8/01)

 

 

Overview

This page is designed to help those who are new to using Unicode characters located in the supplementary planes.  It presents some basic concepts that are needed to understand how to work with these characters and then provides step by step instructions for Mac OS X and for Windows.

 

This document contains five parts:

      1. Background Information about Plane 1

      2. Plane 1 characters under Mac OS

      3. Plane 1 characters under Windows

      4. Plane 1 characters on a web page

      5. Conversion information and tables of characters

 

In addition, I have posted a PDF file that explain how to add supplementary characters to a TrueType font.  Font developers can find it through this link.

 

 

Part 1. Background Information about Plane 1

Until version 3.1 of Unicode was released, all characters were stored in the Basic Multilingual Plane (BMP), the original Unicode codespace with room for about 65,000 characters.  However, the BMP is almost filled up.  Therefore, beginning with version 3.1, Unicode has begun to allo­cate characters to additional groups of 65,000 characters, referred to as supplementary planes.  The BMP is counted as Plane 0.  Plane 1 (also known as the Supplementary Multilingual Plane, or SMP) will mainly be used for historical scripts as well as sets of Western and Byzantine musical symbols.  As of Unicode 4.0, scripts assigned to the SMP include Gothic, Old Italic, Linear B, Cypriot syllabary, Aegean numbers, and Ugaritic cuneiform.  Unicode has also allocated a very large group of Asian ideographs, less commonly used than those in the BMP, to Plane 2, the Supplementary Ideographic Plane (SIP).

 

Note: in this paper I have often used the phrase “Plane 1 characters.”  I find this shorter and easier than saying “supplementary plane characters,” and the characters that I personally work with are in Plane 1.  However, the ideographs in Plane 2 function the same way.

 

A number of additional characters of interest to classicists will be located in the Plane 1.  The Thesaurus Linguae Graecae has submitted proposals for ancient Greek musical notation, numerical characters, and acrophonic numerals.  These have been approved by the Unicode Technical Committee and will be included in future versions of Unicode; for details see the TLG’s web site.  It is also possible that some medieval characters will be placed in Plane 1.  See the website of the Medieval Unicode Font Initiative for details on this project.

 

Characters in the supplementary planes are different than characters in the BMP because they are stored in a Unicode font under one hexadecimal number, but in many applications are accessed through the use of surrogate pairs.  The designers of Unicode, anticipating that more than the original 65,000 char­acters would be needed, devised a mechanism to provide access to 15 addi­tional planes of 65,000 characters by reserving two blocks of codepoints in the BMP, the high surrogates area and the low surrogates area.  An application can combine a high surrogate and low surrogate together to point to a value in one of the upper planes. The reasons why some applications require a surrogate pair and some do not are highly technical and beyond the scope of this paper.

 

 

 

Part 2. Plane 1 Characters under Mac OS

Operating System

You must have OS X.  Plane 1 characters will display using OS 10.1, and do not work with 10.0; I have not been able to test 10.0.1 through 10.0.4.  OS 10.2 has improved font support throughout, and in particular it comes with the Character Palette applet that will display any Uni­code character, including those in Plane 1.  I doubt very much that Plane 1 characters would dis­play under OS 9; but in any case I have never seen a font to test them with.

 

Fonts

Locate a font with the characters you need; there aren't many at present.  The Code 2001 font by James Kass in­cludes the Old Italic, Gothic, and Deseret characters in Unicode 3.1 plus the Old Persian Cuneiform glyphs proposed for Unicode 3.2.  See his page at http://home.att.net/~jameskass/code2001.htm .  Note that this is was originally developed as a Windows TrueType font but will also work under OS X.  Updated versions of Athena and Cardo  with Plane 1 characters of interest to classicists will be available from my web page http://scholarsfonts.net .  Install the font(s) you have obtained by dragging them to the Library/Fonts folder.

 

Choosing an Input Method

The easiest way to enter Plane 1 characters with OS 10.2 is to use the Character Palette.  It can be accessed via the Extras pulldown menu at the bottom of the Fonts dialog box.  You can also access it directly, without going into Fonts, as follows:

If you already have a keyboard menu visible in the Finder (a flag symbol to the right of Help on the menu bar):

1.      Pull down the keyboard menu and see if “Show Character Palette” is there.

2.      If Character Palette is not available, choose “Customize Menu” and a dialog box will open.  Click to place a check mark next to Character Palette, the first item in the list, then close the dialog.

If you do not have a keyboard menu visible in the Finder:

1.      From the Apple menu, choose System Preferences, then double-click the International icon (a UN flag)

2.      Click on the Input Menu tab (right-hand one)

3.      Click to place a check mark next to Character Palette, the first item in the list, then close the dialog.  A keyboard menu will now be visible in the Finder.

 

If you are still using OS 10.1, or if you wish to use the hex input method in addition to the Char­acter Palette in 10.2, you need to install the Unicode Hex Input method.  To install this IM:

1.      from the Apple menu, open System Preferences and double-click the International icon

2.      choose the Keyboard Menu tab at the right

3.      scroll down toward the bottom and select Unicode Hex Input by clicking in the checkbox

4.      close the window

5.      a keyboard icon will now appear on the toolbar; you can select the input method you want here, or use command-spacebar to switch among available scripts.  (Note that OS X considers Unicode a “script” in the same way that Chinese is a different script than Cy­rillic or Roman; you can have more than one input method for it (e.g., Hex Input in addi­tion to the Extended Roman Unicode keyboard).

 

To use the hex input method, you also must have the surrogate pair numbers for the characters you need.    (This is not required to use the Character Palette.)  For instance, the character old italic letter a is U+10300; the two surrogate pairs that may be used to access it are D800 and DF00 (hexadecimal).  The formula for converting a Unicode scalar value (sin­gle hex number) to a pair of surrogates is given below in Part 5 of this document.

 

Entering the characters

To use Character Palette (OS 10.2 or above):

1.      Start TextEdit, the basic editor that comes with OS X

2.      From the Finder menu bar, open the Keyboard menu (a flag symbol)

3.      Choose “Show Character Palette”

4.      Click on the relevant Unicode range in the list at the left.

5.      Locate the character you want in the chart.  Clicking the triangle will let you set the font you want and provide additional information about the character you have highlighted.

6.      Double-click on the character in the chart and it will be pasted into your document.

 

To use the Unicode Hex Input method:

1.      Start TextEdit, the basic editor that comes with OS X

2.      Turn on NumLock on the keypad at the right

3.      Select the Unicode Hex Input from the keyboard icon on the menu bar, or use command-spacebar to switch to the Unicode “script”

4.      Hold down the option key and enter the high surrogate value in hex (e.g., D800), and release.  You do not need to hold down the shift key; typing option-d800 is equivalent to the hex value D800.

5.      Hold down option again and enter the low surrogate in hex (e.g., DF00) and your charac­ter should appear.  If it does not, make sure that you are using the correct font.  If the Mac cannot find the characters in the font you are using, it will display a pair of icons which represent the surrogate pair.

 

It should also be noted that it is possible under OS 10.2 to create customized keyboard layouts in XML format.  I don’t think anyone has done this yet for the Plane 1 characters, but it could be done.  See the details at http://developer.apple.com/technotes/tn2002/tn2056.html .

 

At the moment, TextEdit and SUE are the only editors I know of that can support surrogates.

 

Thanks to Tom Gewecke for help with the earlier version of this Mac OS information.

 

 

Part 3. Plane 1 Characters under Windows

Operating System

You must have Windows XP, Windows 2000 or Windows NT.  Plane 1 characters will not display properly in Windows 98/Me (although if you open a file that contains Plane 1 characters under Win98/Me, it will not be damaged; the Plane 1 characters won’t be visible, but they will still be there if you open the file later under Win2000 or XP).

 

Microsoft has always claimed that Windows 98 does not support supplementary characters.  Some recent experiments reported on the Unicode mailing list indicate that one can in fact display supplementary characters by editing the Registry as described in below.  I have not personally done this, but the adventurous are welcome to try it.

 

Under Windows 2000, it may be necessary to take the preliminary step of enabling support for supplementary characters; by default, it is turned off.  (It is on by default in Windows XP.)  However, support for supplementary characters may have been turned on if you have made certain changes to your system (for example, enabling languages such Hebrew or Arabic or Indic languages).  You must edit the Registry to turn this feature on, and messing with the Registry can be dangerous.  I therefore strongly suggest that you try entering some supple­mentary characters as described below.  If they don’t work, then come back here and follow the directions below to change the Registry settings.  If you don’t know what you are doing, get help from someone who does.  At the very least, after you start RegEdit, open the Help file and print out the page that describes how to restore the Registry if something goes wrong.

 

You must add two keys to the registry.  To do this use the Registry Editor (RegEdit) that comes with Windows.  Choose Start / Run, type regedit in the box, and the Registry Editor will start.  For exact instructions, see the Microsoft Developer page at http://msdn.microsoft.com/library/psdk/winbase/unicode_192r.htm

 

You can also look at the excellent page by Tex Texin at http://www.i18nguy.com/surrogates.html .  This page tells you how to make the one necessary change to the Registry and also provides information on two additional Registry values that are useful when working with Plane 1 characters.  It also contains additional information about supplementary characters and some links.

 

Fonts

Locate a font with the characters you need.  There aren't many at present, and there is no point in going through all these steps if you have no font.  The Code 2001 font by James Kass, specifically designed to support supplementary characters,  is the best place to start.  See his page at http://home.att.net/~jameskass/code2001.htm .  An updated version of Cardo with Plane 1 characters of interest to classicists will be available from my web page http://scholarsfonts.net .  Juan-José Marcos is also in the process of adding Plane 1 characters to his Alphabetum font.  Install the font(s) you have obtained by dragging them to the Windows/Fonts folder, or by choosing Start/Settings/Control Panel/Fonts; File/Install New Font and navigating to the directory where you stored the font.

 

Choosing an Input Method

You must have one of the following:

o       UniPad (http://www.sharmahd.com/unipad/) is a text editor specifically designed to work with Unicode; versions .95 and later support surrogates and the use of code­points in Plane1 and above.  Since UniPad is a plain text editor, you would need to edit the file in another application after entering the characters if you wished to use different font sizes, bold, italic, etc.

o       Versions 5 and above of Keyman can create keyboards using Plane 1 characters.  See http://www.tavultesoft.com/keyman .  I have successfully created a Keyman keyboard for the Old Italic characters.

o       the Microsoft Keyboard Layout Creator utility can create keyboards that utilize supplementary characters.  It is freely available from http://www.microsoft.com/globaldev/tools/msklc.mspx .  Note that you will need to install the Microsoft .NET framework in order for this program to run.

OR

 

Entering the Characters

 

If you have a keyboard or IM, start and use it per the directions that came with it.

 

If you don't have a keyboard or IM, enter characters as follows:

            Because you are entering a pair of surrogate values, you will notice that the cursor will advance after you type the first one, but will display only white space; after you type second, the white space will vanish and the correct character will appear.  If you use the Backspace key to remove a Plane 1 character, you will need to type it twice.

 

Microsoft Word 2000 does not have support for supplementary characters.  Nor does the Windows Character Map in XP.

 

OpenOffice Writer 1.0.3 supports supplementary characters (I have not tested earlier versions).  Unlike Word or WordPad, however, the ALT-x method of entering characters does not work.  Nor does OpenOffice come with its own method for entering characters above the BMP; its Insert / Special Character dialog supports only the BMP.  So you can either enter the text in WordPad and paste it into OpenOffice, or use a keyboard built with Keyman or the Keyboard Layout Creator.  Note that OpenOffice is an open-source project, available for downloading from http://www.openoffice.org/ .

 

The information on this page was gleaned from several sources.  Several people on the Unicode mailing list were very helpful, particularly Tex Texin.

 

 

 

Part 4.  Plane 1 Characters on a Web Page

Note 1: the following information is taken from a thread on the Unicode mailing list.  Thanks to all those who contributed items to the discussion; nothing in this part is original with me.

Note 2: the following discussion assumes you know how to construct web pages; it provides only information specific to getting characters in Planes 1 and above to work.

 

Getting various browsers to display anything outside the BMP is a tricky thing.  The following seem to be true as of November 2002.  For any web page to display non-BMP characters prop­erly, the user must have an appropriate font his or her system, normally the same font specified by the web page.

 

Microsoft Internet Explorer (Windows)

o       use numeric character references (NCRs), either decimal or hex, instead of the stan­dard UTF-8

o       set the encoding for the page to “x-user-defined” rather than “UTF-8”;  sometimes it helps if users manually set the encoding to “User-defined” in their browser

 

Netscape

Netscape does not yet support supplementary characters..

 

Opera

Opera 6 supports characters outside the BMP.

 

A sample of Plane 1 to try

Here is a web page from Tex Texin that displays a sample of Etruscan (Plane 1):

http://www.i18nguy.com/unicode-example-plane1.html

 

 

 

Part 5.  Conversion Information and Table of Characters

Here’s the formula I mentioned above.  You will need this if you know the Unicode scalar values of the characters you need and want to enter them in in WordPad on Windows 2000 (with XP, you can enter the scalar value directly followed by Alt-x) or TextEdit on Mac OS X by typing the two surrogate values.  First convert the single Unicode value to its surrogate pair in hexa­decimal, then, if you are using WordPad, convert the two hex numbers to decimal so you can type them on the keypad.

 

To convert a Plane 1 sequence (S) to a pair of high and low surrogates (H, L):

 

H = (S–1000016) / 40016 + D80016

L = (S–1000016) % 40016 + DC0016

(from The Unicode Standard 3.0, §3.7, page 45)

 

All this math must be done in hexadecimal.  The % character represents the Modulo operator; the calculator applet that comes with Windows, when run in in scientific mode, can do this as well as other hex math.

 

Rather than doing the math yourself, you can use the very convenient calculator by Michael Kaplan at

            http://www.trigeminal.com/16to32AndBack.asp


Note: I have converted the scalar values to the two hex values as carefully as possible, but I do not guarantee 100% accuracy.  Let me know of any errors.

 

H = high surrogate; L = low surrogate; S = Unicode scalar value (hexadecimal)

 

OLD ITALIC

 H    L      S        Name

D800 DF00  10300    OLD ITALIC LETTER A

D800 DF01  10301    OLD ITALIC LETTER BE

D800 DF02  10302    OLD ITALIC LETTER KE

D800 DF03  10303    OLD ITALIC LETTER DE

D800 DF04  10304    OLD ITALIC LETTER E

D800 DF05  10305    OLD ITALIC LETTER VE

D800 DF06  10306    OLD ITALIC LETTER ZE

D800 DF07  10307    OLD ITALIC LETTER HE

D800 DF08  10308    OLD ITALIC LETTER THE

D800 DF09  10309    OLD ITALIC LETTER I

D800 DF0A  1030A    OLD ITALIC LETTER KA

D800 DF0B  1030B    OLD ITALIC LETTER EL

D800 DF0C  1030C    OLD ITALIC LETTER EM

D800 DF0D  1030D    OLD ITALIC LETTER EN

D800 DF0E  1030E    OLD ITALIC LETTER ESH

D800 DF0F  1030F    OLD ITALIC LETTER O

D800 DF10  10310    OLD ITALIC LETTER PE

D800 DF11  10311    OLD ITALIC LETTER SHE

D800 DF12  10312    OLD ITALIC LETTER KU

D800 DF13  10313    OLD ITALIC LETTER ER

D800 DF14  10314    OLD ITALIC LETTER ES

D800 DF15  10315    OLD ITALIC LETTER TE

D800 DF16  10316    OLD ITALIC LETTER U

D800 DF17  10317    OLD ITALIC LETTER EKS

D800 DF18  10318    OLD ITALIC LETTER PHE

D800 DF19  10319    OLD ITALIC LETTER KHE

D800 DF1A  1031A    OLD ITALIC LETTER EF

D800 DF1B  1031B    OLD ITALIC LETTER ERS

D800 DF1C  1031C    OLD ITALIC LETTER CHE

D800 DF1D  1031D    OLD ITALIC LETTER II

D800 DF1E  1031E    OLD ITALIC LETTER UU

           1031F    <reserved>

D800 DF20  10320    OLD ITALIC NUMERAL ONE

D800 DF21  10321    OLD ITALIC NUMERAL FIVE

D800 DF22  10322    OLD ITALIC NUMERAL TEN

D800 DF23  10323    OLD ITALIC NUMERAL FIFTY

 

(continued)


GOTHIC

 H    L      S        Name

D800 DF30  10330    GOTHIC LETTER AHSA

D800 DF31  10331    GOTHIC LETTER BAIRKAN

D800 DF32  10332    GOTHIC LETTER GIBA

D800 DF33  10333    GOTHIC LETTER DAGS

D800 DF34  10334    GOTHIC LETTER AIHVUS

D800 DF35  10335    GOTHIC LETTER QAIRTHRA

D800 DF36  10336    GOTHIC LETTER IUJA

D800 DF37  10337    GOTHIC LETTER HAGL

D800 DF38  10338    GOTHIC LETTER THIUTH

D800 DF39  10339    GOTHIC LETTER EIS

D800 DF3A  1033A    GOTHIC LETTER KUSMA

D800 DF3B  1033B    GOTHIC LETTER LAGUS

D800 DF3C  1033C    GOTHIC LETTER MANNA

D800 DF3D  1033D    GOTHIC LETTER NAUTHS

D800 DF3E  1033E    GOTHIC LETTER JER

D800 DF3F  1033F    GOTHIC LETTER URUS

D800 DF40  10340    GOTHIC LETTER PAIRTHRA

D800 DF41  10341    GOTHIC LETTER NINETY

D800 DF42  10342    GOTHIC LETTER RAIDA

D800 DF43  10343    GOTHIC LETTER SAUIL

D800 DF44  10344    GOTHIC LETTER TEIWS

D800 DF45  10345    GOTHIC LETTER WINJA

D800 DF46  10346    GOTHIC LETTER FAIHU

D800 DF47  10347    GOTHIC LETTER IGGWS

D800 DF48  10348    GOTHIC LETTER HWAIR

D800 DF49  10349    GOTHIC LETTER OTHAL

D800 DF4A  1034A    GOTHIC LETTER NINE HUNDRED