On Thu, May 17, 2012 at 3:00 PM, Richard Wordingham < richard.wording...@ntlworld.com> wrote:
> If using DUCET, the collation elements for 0F71+0F71+0F72 are those for > <0F73, 0F71>, namely (at 6.1.0): > > [.2572.0020.0002.0F73][.2570.0020.0002.0F71]. > > The correct collation elements for FCD sequence 0F71,0F71,0F72,0F72 > are: > > [.2572.0020.0002.0F73][.2572.0020.0002.0F73] > > However, if we don't have a contraction (within the > strictly non-normalising tailoring) for 0F71+0F71+0F72+0F72, we will > incorrectly derive the collation element sequence > > [.2572.0020.0002.0F73][.2570.0020.0002.0F71][.2571.0020.0002.0F72] > I see. It's because 0F71+0F73==0F73+0F71 (for canonical closure we should have both versions) and the latter overlaps with decomposition mappings. Sigh... HOWEVER, you must *not* have the added contraction for 0F71+0F71. > If we don't have this prefix contraction, then we will miss a discontiguous-contraction match on <0F71, 0334, 0F71, 0F72>. Within the tailoring, 0F73 must have ccc 130. No, it has ccc=0. I believe that an FCD-accepting implementation should work with the "leading ccc" and "trailing ccc" values rather than ccc itself. Do bear in mind that DUCET 6.1.0 requires an infinite set of > contractions if you are to collate FCD strings without doing some > normalisation, such as splitting the Tibetan long vowels. > I will re-read your earlier emails to see if this is really the case. And in Q3 I will try to write code to find the necessary overlaps between contractions and decomposition mappings. Thanks, markus