Re: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination

Mark Davis Sat, 08 Nov 2003 22:14:04 -0800

You are stating many things as if they were facts, when they are simply not
true. You should verify them against the definitions before stating them in such
a 'definitive' way.


Examples:
- VS1 is a combining character, and not a base character.
http://oss.software.ibm.com/cgi-bin/icu/ub/utf-8/?ch=FE00

- Default grapheme clusters do not include ZWJ; as a matter of fact, default
grapheme clusters, except for Hangul Jamo Syllables and a few exceptional cases,
are identical with combining sequences.
http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

- *Tailored* grapheme clusters may include longer sequences, but it is not at
all obvious whether they would contain ever ZWJ or ZWNJ.

>...rendering of text works on grapheme clusters
- Rendering units are, in general, orthogonal to whether a sequence is a
grapheme cluster or not. "fi" may be a ligature in English, but is certainly not
a grapheme cluster.

Mark
__________________________________
http://www.macchiato.com
â ààààààààààààààààààààà â

----- Original Message ----- 
From: "Philippe Verdy" <[EMAIL PROTECTED]>
To: "Peter Kirk" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Sat, 2003 Nov 08 17:15
Subject: Re: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination


> I'm curious about what name you would give to it.
> The name COMBINING CHARACTER JOINER is already used...
>
> In all our discussions we should have used the term "starter" (instead of
> just "base character" which is ambiguous) for any characters of combining
> class 0 and which include:
>
>     Base characters (includes conjoining characters):
>         letter, syllable or ideograph (gc=L*),
>         number (gc=N*),
>         punctuation (gc=P*),
>         symbol (gc=S*),
>         space (gc=Zs)
>         agreed private use characters (gc=Co and private agreement)
>     Starter Combining characters:
>         (gc=M* and CC=0) such as CGJ
>     Controls:
>         (gc=C* except Co),
>     Text separators:
>         (gc=Zl, Zp)
>     Unknown private use characters:
>         (gc=Co and no private agreement)
>
> For other characters with combining class > 0, we should have used the term
> "non-starter", not the term "combining character" which may or may not be a
> "starter".
>
> It is clear however that we made a distinction between "combining sequences"
> (made of a unique starter and optionally followed by non-starters) and
> "grapheme clusters" (which are made of one or more combining sequences). For
> example, the (hypothetic) encoded text:
>
>     <ALEF, ZWJ, LAMED, VAV, VS1, HOLAM, NUN, METEG, CGJ, HATAF PATAH>
>
> is made of 7 "combining sequences":
>
>     <ALEF>,
>     <ZWJ>,
>     <LAMED,
>     <VAV>,
>     <VS1, HOLAM>,
>     <NUN, HATAF PATAH>,
>     <CGJ, METEG>
>
> (where the starters are VAV, VS1, NUN, CGJ),
> and 3 "grapheme clusters":
>
>     <ALEF, ZWJ, LAMED,
>     <VAV, VS1, HOLAM>,
>     <NUN, HATAF PATAH, CGJ, METEG>
>
> (ZWJ is a format control and ignored in the determination of grapheme
> cluster boundaries).
>
> Grapheme clusters may be created by grouping several combining sequences
> without using CGJ, ZWJ, ZWNJ, or variant selectors: see examples in South
> Asian scripts, and with Hangul Jamos.
>
> Generally, collation and rendering of text works on grapheme clusters (or
> groups of these clusters with language-specific tailoring); but not on
> combining sequences whose role is either related to string identity
> excluding any concept of relative order (i.e. normalization and canonical
> equivalence), or to text transforms or folding.
>
> Compatibility equivalence is also defined but neither on combining
> sequences, nor on grapheme clusters: there may be a mapping from one
> character (i.e. only a part of a combining sequence) to several characters
> that belong to distinct combining sequences and distinct grapheme clusters,
> for example with some ligatures of base letters (example: the "ffi"
> ligature, which participates to only 1 combining sequence and only 1
> grapheme cluster, is mapped to 3 distinct combining sequences and 3 distinct
> grapheme clusters).
>
> ----- Original Message ----- 
> From: "Peter Kirk" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Sunday, November 09, 2003 1:20 AM
> Subject: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination
>
>
> > So that you don't hold try to your breath over the weekend to find out
> > what I am planning to propose, as announced on the main Unicode list...
> >
> > The issue in question is the ligation of hataf vowels and meteg. Hataf
> > vowels with medial meteg are clear cases of ligatures between the basic
> > vowels and meteg. But there seems to be no mechanism in Unicode so far
> > to promote such a ligature. So, my suggestion is to propose a new
> > combining character COMBINING CHARACTER JOINER (combining class zero),
> > defined with semantics similar to ZWJ rather than CGJ i.e. to affect
> > ligation but not collation.
> >
> > Comments?
> >
> > -- 
> > Peter Kirk
> > [EMAIL PROTECTED] (personal)
> > [EMAIL PROTECTED] (work)
> > http://www.qaya.org/
> >
> >
> >
>
>
>

Re: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination

Reply via email to