On Fri, Mar 15, 2013 at 6:52 PM, Richard Wordingham < richard.wording...@ntlworld.com> wrote:
> > The "fractional" refers to the same kind of mechanism as the "large > > weight values" in the UCA spec. > > Yes. The problem is that formally the UCA clearly treats 'large > weights' as being in multiple collation elements, whereas, in various > places, for transforming collation element tables properly, one needs > them to be treated as being in a single collation element. > Correct, that's where the complexities are that I mentioned. ICU's code has to look at whether a CE is a "continuation CE" for whether to apply the script-reordering permutation or the uppercase-first permutation, etc. > The point is that no sequence of > > units (8-bit, 16-bit or whatever the implementation uses) can be an > > exact prefix of another sequence. > > That's only for efficiency. No, it's critical for correctness. One could allocate low unit values to the > start units and high unit values to continuation units. By using high > values for continuation units, DUCET simplifies the identification' > One could pick nearly any range for the trailing units. With the UCA spec using 16-bit units and only 21 bits to encode in a pair, there is nearly free choice for the range of trail units. markus