You are stating many things as if they were facts, when they are simply not true. You should verify them against the definitions before stating them in such a 'definitive' way.
Examples: - VS1 is a combining character, and not a base character. http://oss.software.ibm.com/cgi-bin/icu/ub/utf-8/?ch=FE00 - Default grapheme clusters do not include ZWJ; as a matter of fact, default grapheme clusters, except for Hangul Jamo Syllables and a few exceptional cases, are identical with combining sequences. http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries - *Tailored* grapheme clusters may include longer sequences, but it is not at all obvious whether they would contain ever ZWJ or ZWNJ. >...rendering of text works on grapheme clusters - Rendering units are, in general, orthogonal to whether a sequence is a grapheme cluster or not. "fi" may be a ligature in English, but is certainly not a grapheme cluster. Mark __________________________________ http://www.macchiato.com â ààààààààààààààààààààà â ----- Original Message ----- From: "Philippe Verdy" <[EMAIL PROTECTED]> To: "Peter Kirk" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Sat, 2003 Nov 08 17:15 Subject: Re: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination > I'm curious about what name you would give to it. > The name COMBINING CHARACTER JOINER is already used... > > In all our discussions we should have used the term "starter" (instead of > just "base character" which is ambiguous) for any characters of combining > class 0 and which include: > > Base characters (includes conjoining characters): > letter, syllable or ideograph (gc=L*), > number (gc=N*), > punctuation (gc=P*), > symbol (gc=S*), > space (gc=Zs) > agreed private use characters (gc=Co and private agreement) > Starter Combining characters: > (gc=M* and CC=0) such as CGJ > Controls: > (gc=C* except Co), > Text separators: > (gc=Zl, Zp) > Unknown private use characters: > (gc=Co and no private agreement) > > For other characters with combining class > 0, we should have used the term > "non-starter", not the term "combining character" which may or may not be a > "starter". > > It is clear however that we made a distinction between "combining sequences" > (made of a unique starter and optionally followed by non-starters) and > "grapheme clusters" (which are made of one or more combining sequences). For > example, the (hypothetic) encoded text: > > <ALEF, ZWJ, LAMED, VAV, VS1, HOLAM, NUN, METEG, CGJ, HATAF PATAH> > > is made of 7 "combining sequences": > > <ALEF>, > <ZWJ>, > <LAMED, > <VAV>, > <VS1, HOLAM>, > <NUN, HATAF PATAH>, > <CGJ, METEG> > > (where the starters are VAV, VS1, NUN, CGJ), > and 3 "grapheme clusters": > > <ALEF, ZWJ, LAMED, > <VAV, VS1, HOLAM>, > <NUN, HATAF PATAH, CGJ, METEG> > > (ZWJ is a format control and ignored in the determination of grapheme > cluster boundaries). > > Grapheme clusters may be created by grouping several combining sequences > without using CGJ, ZWJ, ZWNJ, or variant selectors: see examples in South > Asian scripts, and with Hangul Jamos. > > Generally, collation and rendering of text works on grapheme clusters (or > groups of these clusters with language-specific tailoring); but not on > combining sequences whose role is either related to string identity > excluding any concept of relative order (i.e. normalization and canonical > equivalence), or to text transforms or folding. > > Compatibility equivalence is also defined but neither on combining > sequences, nor on grapheme clusters: there may be a mapping from one > character (i.e. only a part of a combining sequence) to several characters > that belong to distinct combining sequences and distinct grapheme clusters, > for example with some ligatures of base letters (example: the "ffi" > ligature, which participates to only 1 combining sequence and only 1 > grapheme cluster, is mapped to 3 distinct combining sequences and 3 distinct > grapheme clusters). > > ----- Original Message ----- > From: "Peter Kirk" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Sunday, November 09, 2003 1:20 AM > Subject: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination > > > > So that you don't hold try to your breath over the weekend to find out > > what I am planning to propose, as announced on the main Unicode list... > > > > The issue in question is the ligation of hataf vowels and meteg. Hataf > > vowels with medial meteg are clear cases of ligatures between the basic > > vowels and meteg. But there seems to be no mechanism in Unicode so far > > to promote such a ligature. So, my suggestion is to propose a new > > combining character COMBINING CHARACTER JOINER (combining class zero), > > defined with semantics similar to ZWJ rather than CGJ i.e. to affect > > ligation but not collation. > > > > Comments? > > > > -- > > Peter Kirk > > [EMAIL PROTECTED] (personal) > > [EMAIL PROTECTED] (work) > > http://www.qaya.org/ > > > > > > > > >