The UTC just approved a clarification of the base character definition, as follows:

D13a Graphic character: a character with the General Categories of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs).

  • Graphic characters specifically exclude the line and paragraph separators (Zl, Zp) and exclude the characters with the General Categories of Other (Cn, Cs, Cc, Cf).
  • For more information, see Chapter 2, especially Section 2.4 Code Points and Characters and Table 2-2 Types of Code Points.
  • Not all graphic characters have visibly rendered glyphs. Particular examples include spaces and some combining marks.
  • The interpretation of private use characters (Co) as graphic characters or not is determined by private agreement. However, in the absence of private agreement, private use characters should be interpreted as graphic characters.

D13b Base character: any graphic character except for those with the General Category of Combining Mark (M).

  • Most Unicode characters are base characters. A base character is any code point that has one of the General Categories of Letter (L), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs).
  • Base characters are independent graphic characters, but this does not preclude the presentation of base characters from adopting different contextual forms or participating in ligatures.
  • The interpretation of private use characters (Co) as base characters or not is determined by private agreement. However, in the absence of private agreement, private use characters should be interpreted as base characters.

D14 Combining character: a graphic character with the General Category of Combining Mark (M).

  • The graphic positioning of a combining character depends on the last preceding base character. The combining character is said to apply to that base character.
  • Combining characters consist of all characters with the General Category values of Spacing Combining Mark (Mc), Non-Spacing Mark (Mn), and Enclosing Mark (Me).
  • All characters with non-zero canonical combining class (Cc) are combining characters, but the reverse is not the case: there are combining characters with a zero canonical combining class.
  • The interpretation of Private Use characters (Co) as combining characters or not is determined by private agreement.

ZWJ, ZWNJ, CGJ and combination

> Are the characters ZWJ, ZWNJ and CGJ base characters, combining
> characters, neither, or even both? Which specific character properties
> should I look at to decide this?
> Are these characters legal within combining character sequences? Can ZWJ
> and ZWNJ be used to control ligation of combining characters? If not, is
> there an alternative mechanism for this?

