On Sat, Mar 16, 2013 at 4:09 AM, Richard Wordingham < richard.wording...@ntlworld.com> wrote:
> Please give an example of how the low/high split would fail. With the > primary collation weights 20, 21, 21 80 and 22 I get the following > primary collation weight sequences for one and two collating elements, > marking boundaries of collating elements with commas: > The problem is that if you have 21 and 21 80, and another primary starts with 80, you can't distinguish the sequence 21 | 80 from the one weight 21 80. For most uses, in particular, those in DUCET, the trailing units must > not be mistakable for variable primary collation elements. You have to know which one is a trailing unit. I suppose you could do it via ranges like in UTF-8, but that means you can use fewer byte values per position and thus yields longer weights, and longer sort keys. It is more efficient to get leading vs. trailing information from the data structure. markus