On Fri, 18 May 2012 09:51:34 -0700 Markus Scherer <markus....@gmail.com> wrote:
> On inspection, we think we can do better (and want to), probably by > adding overlap contractions. If we get into trouble with that, we > will think of alternatives. One is to decompose more characters even > in FCD input. Another is to keep documenting a limitation *when > normalization is off*. Just in case you haven't already thought of it, one reasonable scheme would be to decompose input if and only if searching for contractions or the input character could *hide* the start of a contraction, e.g. one starting with a combining accent or the non-initial part of an Indic vowel. One will already have left or be about to leave the 'fast loop', and of course converting FCD to NFD is easy, as no rearrangement is required. The contractions that need to be added are merely the canonical closure of all the explicitly defined contractions, reduced by requiring that the contraction definition be in NFD after the first character. This will then work for DUCET 6.1.0, work for Danish, and work for my mischievous 0302 COMBINING CIRCUMFLEX ACCENT+0067 LATIN SMALL LETTER G contraction. Richard.