Accessibility

Unicode and Glyph Names

Copyright

Copyright (c) 1997,1998,2002,2003,2007 Adobe Systems Incorporated

Permission is hereby granted, free of charge, to any person obtaining a copy of this documentation file to use, copy, publish, distribute, sublicense, and/or sell copies of the documentation, and to permit others to do the same, provided that:

Permission is hereby granted, free of charge, to any person obtaining a copy of this documentation file, to create their own derivative works from the content of this document to use, copy, publish, distribute, sublicense, and/or sell the derivative works, and to permit others to do the same, provided that the derived work is not represented as being a copy or version of this document.

Adobe shall not be liable to any party for any loss of revenue or profit or for indirect, incidental, special, consequential, or other similar damages, whether based on tort (including without limitation negligence or strict liability), contract or other legal or equitable grounds even if Adobe has been advised or had reason to know of the possibility of such damages. The Adobe materials are provided on an "AS IS" basis. Adobe specifically disclaims all express, statutory, or implied warranties relating to the Adobe materials, including but not limited to those concerning merchantability or fitness for a particular purpose or non-infringement of any third party rights regarding the Adobe materials.

[ Document version 2.4. Last updated September 24, 2003 ]

1. Introduction

The purpose of the Adobe Glyph Naming convention is to support the computation of a Unicode character string from a sequence of glyphs. This is achieved by specifying a mapping from glyph names to character strings.

The mapping is meant to convert the sequence of glyphs to plain text while preserving the underlying semantics. For example, a glyph for "A" and a glyph for "small capital A" and a glyph for a swash variant of "A" will all be mapped to the same Unicode value. This is useful in copying text in some environments, and is useful for doing text searches that will match all glyphs in the original string that mean "A".

It is outside the scope of this specification to determine the set of legal glyph names. Glyph names occurs in many different contexts, each having its own definition of what constitutes a legal name. This specification only assumes that a glyph name is an arbitrary finite sequence of Unicode characters.

The specification consists of the Adobe Glyph List (AGL), a mapping of specific names to Unicode values, and of rules for decomposing glyph names. Because it is anticipated that this specification will be implemented in many pieces of software, and that revising consistently all those implementations is unlikely, this specification is intended to be stable, i.e. never revised. In particular, it is intended that no mappings will ever be added to the AGL. Also, the AGL is not meant to serve as a guide to choosing glyph names for new glyphs. For that purpose, please see section 6, Assigning glyph names in new fonts, and the glyph name lists referenced in that section.

This specification supports the full range of Unicode scalar values, U+0000 through U+10FFFF. It does not depend on the character repertoire of a specific Unicode version; thus, it should be applicable past, current and future versions of the Unicode standard.

Font producers are encouraged to respect this specification in naming their glyphs. Font consumers are encouraged to follow this specification when trying to discover the character content of glyphs.

2. The mapping

To map a glyph name to a character string:

Step 1: drop all the characters from the glyph name starting with the first occurrence of a period (U+002E FULL STOP), if any.

Step 2: split the remaining string into a sequence of components, using underscore (U+005F LOW LINE) as the delimiter.

Step 3: map each component to a character string according to the procedure below, and concatenate those strings; the result is the character string to which the glyph name is mapped.

3. Examples

  1. The name "Lcommaaccent" has a single component, which is mapped to the string U+013B by the AGL
  2. The name "uni20AC0308" has a single component, which is mapped to the string U+20AC U+0308
  3. The name "u1040C" has a single component, which is mapped to the string U+1040C
  4. The name "uniD801DC0C" has a single component, which is mapped to the empty string. Neither D801 nor DC0C are in the appropriate set. This form cannot be used to map to the character which is written D801 DC0C in UTF-16, i.e. U+1040C. This character can be mapped to using the component "u1040C".
  5. The name "uni20ac" has a single component, which is mapped to the empty string (note the lowercase "a" and "c")
  6. The name "Lcommaaccent_uni20AC0308_u1040C.alternate" has three components, which are the "Lcommaaccent", "uni20AC0308" and "u1040C". It is mapped to the string U+013B U+20AC U+0308 U+1040C.
  7. Generally, many names can be mapped to the same string; for example the components "Lcommaaccent", "uni013B" and "u013B" all map to the string U+013B.
  8. The name "foo" maps to the empty string, because "foo" is not in the AGL and it does not start with a "u"
  9. The name ".notdef" is reduced to the empty string by the first step, and is mapped to the empty string by the last clause of the third step.

4. Private Use Area (PUA) scalar values

This specification supports the mapping of glyph names to strings that contain private use area scalar values. For example, the names "Ogoneksmall" and "uniF6FB" both map to the string U+F6FB.

This specification does not include, imply nor assume any particular usage of the PUA; it merely permits to name glyphs such that the restored character strings include PUA code points. It is up to the producers and consumers of glyph names to establish an agreement on the PUA usage.

Font designers should note that establishing this agreement with users of general purpose fonts can be difficult. It is likely that not all tools manipulating character strings built from glyph names will correctly implement the PUA usage, and this can lead to incorrect functionality. It is therefore recommended, for general purpose fonts, that all glyph names convert to strings that do not contain PUA characters.

5. Compatibility considerations

This specification has evolved over time. Please refer to the Document Changes section for changes.

6. Assigning glyph names in new fonts

For glyphs which correspond to characters in the Unicode standard, it is recommended to build names with the "uni" prefix for characters in the Basic Multilingual Plane (BMP), and with the "u" prefix for characters in the Unicode supplemental planes, according to the rules given in section 2.

This does not mean that fonts will become invalid if they are made without using the "uni" and "u" prefixes for glyph names. With one group of exceptions, all names from the AGL v1.2 (see link in section 5) currently work in all known cases as well names with the "uni" prefix. The exceptions are the AGLv1.2 names which are associated with Unicode Private Use Area (PUA) values. These include all the superiors and small cap names. Use of these names will, for the purpose of searching text, lead some current implementations to map names like "Asmall" to the PUA Unicode value from AGL v1.2, rather than to the Unicode value for "A". We now recommend naming these glyphs according to the rules below. A subset of the AGL v1.2 name set without the names associated with the PUA can be found in: Adobe Glyph List For New Fonts.

If multiple glyphs in the font represent the same character in the Unicode standard, as do "A" and "A.swash" they can be differentiated by using the same base name with different suffixes. The suffix (the part of glyph name that follows the first period) does not participate in the computation of a character sequence. It can be used by font designers to indicate some characteristics of the glyph. The suffix may contain periods or any other permitted characters. Small cap A, for example, could be named "uni0041.sc" or "A.sc".

If there are multiple variants of the same base glyph, then the variant suffixes should include zero-padded fixed length numbers so that if and when the glyph names are sorted, the intended order will be preserved. For example, if the "ampersand" glyph has 23 alternates, they would be named "ampersand.alt01" through "ampersand.alt23", rather than "ampersand.alt" along with "ampersand.alt1" through "ampersand.alt22". This rule only provides a minor convenience for font development and testing. As noted before, suffixes do not participate in the computation of a character sequence.

This specification does not standardize any of the suffixes. Any suffix will work as well as any other suffix for the purposes of text searching. For convenience during devlopment and testing, Adobe uses the most appropriate OpenType Layout feature name for a suffix. For example, a smallcap "a" could be named "a.smcp", a initial form "a.init", a final form "a.fina" and a swash form "a.swsh". If there are more swash forms, they could be called "a.swsh1", "a.swsh2" etc. The following are examples of suffixes used in Adobe fonts:

a.sc small capital a
T.swash swash variant of T
T.begin variant of T used at the beginning of a word
T.end variant of T used at the end of a word
T.end1 another variant of T used at the end of a word
T.alt01 first decorative variant of T
T.alt02 another decorative variant of T
one.superior variant of one to be used in superscripts
one.inferior variant of one to be used in subscripts
one.numerator variant of one to be used in fractions
one.denominator variantof one to be used in fractions
one.fitted proportional variant of one, used when default numerals are all tabular.
one.tab tabular variant of one, used when when default numerals are all proportional.
one.oldstyle proportional oldstyle variant of one
one.taboldstyle tabular oldstyle variant of one

For glyphs which do not correspond to any character in the Unicode standard, the name will not have any technical usefulness. Any name can be assigned, as long as the name will not be interpreted as having semantic value by the rules in this article. The practice of the Adobe Type Department is that if there is any useful descriptive tag for a glyph, name it accordingly, e.g. "mouse", "signForSale", "christmastreeBall12". Otherwise, name it as variant of "orn" (short for ornament), e.g " orn001", "orn123".

For glyphs which represent ligatures of standard Unicode characters, there are two formats are available for its name:

Format 1: Descriptive

The decomposition is expressed by joining the glyph names of the standard Unicode characters, in order, by underscores (U+005F LOW LINE). The glyph names of the characters should be built with the "uni" or "u" name prefixes and hexadecimal digits, as described above, or with a name from the AGL.

For example, the "o f f i" ligature could be named "o_f_f_i".

Format 2: Unicode with "uni" prefix.

The glyph name is expressed as the prefix "uni" followed by two or more sequences of four hexadecimal digits. Each sequence of four digit hexadecimal digits indicates the Unicode scalar values of the standard Unicode characters, in order.

For example, the character LATIN CAPITAL LETTER EZH WITH CIRCUMFLEX AND GRAVE, which is not in Unicode, should be named "uni01B703020300", since LATIN CAPITAL LETTER EZH is at U+01B7, COMBINING CIRCUMFLEX ACCENT is at U+0302, and COMBINING GRAVE ACCENT is at U+0300.

A maximum of 7 name components is available with this format, due to glyph name length restrictions.

A ligature of the glyphs named "T.swash" and "h" can be named "T_h.swash". "T.swash_h" would be incorrect since this will be interpreted as a glyphic variant of "T".

Some current software limitations subject all glyph names to a limit of 31 characters in length, and require that they be entirely comprised of characters from the following set:

  1. A-Z
  2. a-z
  3. 0-9
  4. . (period, U+002E FULL STOP)
  5. _ (underscore, U+005F LOW LINE)

These limitations can be an issue with ligature names, and with ornament names.

A brief review of some current implementation issues and the consequent limits on glyph names is given by "Glyph Names and Current Implementations".

7. Document changes

v2.4

[September 24, 2003] Minor revision. Pointed URL for Adobe Glyph Names for New Fonts to a new revision;

v2.3

[April 17, 2003] Minor revision. Added a short sentence to make it clear that the "uni" prefix can be used only with BMP Unicode values.

v2.3

[April 17, 2003] Minor revision. Added a short sentence to make it clear that the "uni" prefix can be used only with BMP Unicode values.

v2.2

[January 31, 2003] Minor revision. Added a link to list of glyph names to use when making new fonts, and emphasized that the AGL v2.0 is not meant for this purpose, nor is it about encoding glyphs in a font.

v2.1

[November 4, 2002] Minor revision, expanding the section on assigning glyph names in new fonts.

v2.0

[September 20, 2002] Major revision, which focuses the document on the conversion of glyph names to Unicode scalar values;addition of many names to the AGL; update of the ZapfDingbats list to Unicode 3.2.

v1.1

[17 December 1998] Generally revised entire document. Updated most tables and data files. Added section on selecting glyph names. Pseudocode for extracting semantics expanded to include non-Unicode ligatures and glyphic variants. Added section on providing separate designs for double-mappings. Removed section on discrepancies with WGL4 (no longer applicable; WGL4 was updated).

v1.0

[10 November 1997] First version.