Let's try to be clear on the terms.

Look at the definition of combining sequences:
D17 Combining character sequence: A character sequence consisting of either a
base character followed by a sequence of one or more combining characters, or a
sequence of one or more combining characters.

Thus a combining character sequence *cannot* contain a ZWJ or any other Cf.

Any use of a ZWJ before a combining mark produces a *defective* combining
character sequence (D17a), which isolates the combining mark from any preceeding
base character.

And as I said earlier:

> - *Default* grapheme clusters do not include ZWJ; as a matter of fact, default
> grapheme clusters, except for Hangul Jamo Syllables and a few exceptional
cases,
> are identical with combining sequences.
> http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

> - *Tailored* grapheme clusters may include longer sequences, but it is not at
> all obvious whether they would contain ever ZWJ or ZWNJ.

I'll expand on the latter. What constitutes a tailored grapheme cluster is up to
a particular process, and so one could contain a ZWJ. However, any combining
mark after a ZWJ does *not* apply to a previous base character within that
tailored grapheme cluster, so the use of a ZWJ would isolate that combining
mark. Such a sequence would not correspond to anything used in a natural
language.

Mark
__________________________________
http://www.macchiato.com
â ààààààààààààààààààààà â

----- Original Message ----- 
From: "Peter Kirk" <[EMAIL PROTECTED]>
To: "Mark Davis" <[EMAIL PROTECTED]>
Cc: "Unicode List" <[EMAIL PROTECTED]>
Sent: Sun, 2003 Nov 09 09:19
Subject: Re: ZWJ, ZWNJ, CGJ and combination


> On 08/11/2003 17:09, Mark Davis wrote:
>
> >I agree with the first part of your analysis. By the phrase "requesting
ligation
> >of combining characters" it is unclear to me what you mean, and whether that
is
> >the right solution to whatever problem you are referring to.
> >
> >Mark
> >__________________________________
> >http://www.macchiato.com
> >â ààààààààààààààààààààà â
> >
> >
> >
> A further reply to this one:
>
> On the bidi list Paul Nelson pointed out that in Khmer ZWJ and ZWNJ do
> not break combining sequences; or at least they do not break grapheme
> clusters, which is not quite the same thing. And the same may be true of
> Indic scripts, although in the examples I found ZWJ/ZWNJ is always at
> the end of a combining sequence. Are ZWJ and ZWNJ actually used within
> combining character sequences (or what would be such sequences if not
> technically broken)? Is there some tension here with the general
> definition of combining character sequences?
>
> If Khmer really does do this, and unless there are any real objections
> to this practice, perhaps the best way ahead, rather than defining a new
> COMBINING CHARACTER JOINER and changing the Khmer encoding, is to adjust
> the definition of combining character sequences to allow ZWJ, ZWNJ and
> perhaps some other suitable layout control characters to be included
> within such sequences. This would allow the Hebrew issue to be solved in
> a way analogous to the Khmer issue.
>
> -- 
> Peter Kirk
> [EMAIL PROTECTED] (personal)
> [EMAIL PROTECTED] (work)
> http://www.qaya.org/
>
>
>


Reply via email to