-----BEGIN PGP SIGNED MESSAGE----- Kenneth Whistler wrote: > Kent Karlsson's suggestion: > > > I vaguely suggested adding > > an enclosing (in some sense) invisible combining character to > > solve this: <o, CGJ, o, invisible-enclosing, combining breve>. > > No character has been designated for such use, though. And I > > haven't made a formal proposal yet. > > (i.e. create a generic way to make a non-enclosing combining mark > apply to a grapheme cluster, by encoding an invisible enclosing > combining mark)
For this approach to work, <invisible-enclosing> must have combining class 0, and be in Grapheme_Extend and general category Mn. Because it involves a new character, it can't be included in the standard until Unicode 3.3, and since that character will not be in any of the Grapheme_* classes, existing implementations will then treat the sequence as *three* grapheme clusters. An alternative is to use CGJ itself for <invisible-enclosing>, i.e. <o, CGJ, o, CGJ, combining breve>. This works because: - CGJ has combining class 0, so it prevents the breve from composing with the second o. - CGJ has general category Mn and is invisible, as required. - it is straightforward to modify the grapheme breaking rules to treat this as a single cluster, by adding the rule "Link × Extend". (This assumes the corrections to the other rules that I described in my comments.) I also considered <o, CGJ, combining breve, o> (which encodes the breve in the same position that a double diacritic would be). That has the disadvantage that it requires the more complicated rule "Link × *Extend (Precede / Base)", though. If only one combining mark is allowed to apply to a cluster using CGJ, then "Link × (Precede / Extend) Base" would probably suffice, but I still prefer adding "Link × Extend" to the rules that I suggested in part 1 of my comments, since they are defined only in terms of character pairs without any lookahead. Here is what I'm suggesting written out in full: When a sequence of combining diacritical marks immediately follows CGJ, apply them to the whole preceding grapheme cluster. Use the following breaking rules, with Precede = Join_Control: CR × LF Base × Extend } Extend × Extend } equivalent to Base × Link } (Base / Extend) × (Extend / Link) Extend × Link } Precede × Precede } Precede × Base } equivalent to Link × Precede } (Link / Precede) × (Precede / Base) Link × Base } Link × Extend L × (L / V / LV / LVT) (LV / V) × (V / T) (LVT / T) × T Any ÷ [Since it is harmless to have "Precede × Extend", another possibility would be to change the third block of rules to: (Link / Precede) × (Precede / Base / Extend) In other words, a break can only occur after Link or Join_Control if they are followed by a control character. It would not be a good idea to further simplify this to "(Link / Precede) × Any", since we always want there to be a break at the end of a line, for example.] - -- David Hopwood <[EMAIL PROTECTED]> Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/ RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01 Nothing in this message is intended to be legally binding. If I revoke a public key but refuse to specify why, it is because the private key has been seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip -----BEGIN PGP SIGNATURE----- Version: 2.6.3i Charset: noconv iQEVAwUBPIRlEzkCAxeYt5gVAQGAKQf/af5ePbLyscgW4sPhPaDdZYtAwygjO6n9 BaMFPED/i/GLiFzXNDMVJV7+PcDMOxKEq6sSHb66j5dpjpOt/PBZsrwd/ywGJuVs 0ehX54NsGYG7A9TiIRJcBGpXWapKjbupyjD0O+DdwWWmzpWmygEXDbOemjU8g6L9 Su0cl/grd2bFCokVKmHrQWoTY+GYUpByDZ388uWmX7ydaLWd4j4fvct/cBXa8Kls Uwv8bsj7iz8TC/vAKy3r55Xll3ZPL2vLm+v82nIugCIuYxfJRRfHqXPSXMDoKOs2 GodsjLhHamDUpeGs9pTtojRTEFdGfkhMNs+fpecN3b0yNfHGFa5HEw== =/7Ml -----END PGP SIGNATURE-----