On Sun, 10 Dec 2017 21:14:18 -0800 Manish Goregaokar via Unicode <[email protected]> wrote:
> > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant > > You can also explicitly request ligatureification with a ZWJ, so > perhaps this rule should be something like > > (Virama ZWJ? | ZWJ) x Extend* LinkingConsonant > > -Manish > > On Sat, Dec 9, 2017 at 7:16 AM, Mark Davis ☕️ via Unicode < > [email protected]> wrote: > > > 1. You make a good point about the GB9c. It should probably instead > > be something like: > > > > GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant This change is unnecessary. If we start from Draft 1 where there are: GB9: × (Extend | ZWJ | Virama) GB9c: (Virama | ZWJ ) × LinkingConsonant If the classes used in the rules are to be disjoint, we then have to split Extend into something like ViramaExtend and OtherExtend to allow normalised (NFC/NFD) text, at which point we may as well continue to have rules that work without any normalisation. Informally, ViramaExtend = Extend and ccc ≠ 0. OtherExtend = Extend and ccc = 0. (We might need to put additional characters in ViramaExtend.) This gives us rules: GB9': × (OtherExtend | ViramaExtend | ZWJ | Virama) GB9c': (Virama | ZWJ ) ViramaExtend* × LinkingConsonant So, for a sequence <virama, ZWJ, nukta, LinkingConsonant>, GB9' gives us virama × ZWJ × nukta LinkingConsonant and GB9c' gives us virama × ZWJ × nukta × LinkingConsonant --- In Rule GB9c, what examples justify including ZWJ? Are they just the C1 half-forms? My knowledge suggests that GB9c'': Virama (ZWJ | ViramaExtend)* × LinkingConsonant might be more appropriate. Richard.

