On Fri, 2 Nov 2018 14:54:19 +0100 Philippe Verdy via Unicode <[email protected]> wrote:
> It's not just a question of "I like it or not". But the fact that the > standard makes the presence of 0000 required in some steps, and the > requirement is in fact wrong: this is in fact NEVER required to > create an equivalent collation order. these steps are completely > unnecessary and should be removed. > > Le ven. 2 nov. 2018 à 14:03, Mark Davis ☕️ <[email protected]> a > écrit : > > > You may not like the format of the data, but you are not bound to > > it. If you don't like the data format (eg you want [.0021.0002] > > instead of [.0000.0021.0002]), you can transform it however you > > want as long as you get the same answer, as it says here: > > > > http://unicode.org/reports/tr10/#Conformance > > “The Unicode Collation Algorithm is a logical specification. > > Implementations are free to change any part of the algorithm as > > long as any two strings compared by the implementation are ordered > > the same as they would be by the algorithm as specified. > > Implementations may also use a different format for the data in the > > Default Unicode Collation Element Table. The sort key is a logical > > intermediate object: if an implementation produces the same results > > in comparison of strings, the sort keys can differ in format from > > what is specified in this document. (See Section 9, Implementation > > Notes.)” Given the above paragraph, how does the standard force you to use a special 0000? Perhaps the wording of the standard can be changed to prevent your unhappy interpretation. > > That is what is done, for example, in ICU's implementation. See > > http://demo.icu-project.org/icu-bin/collation.html and turn on "raw > > collation elements" and "sort keys" to see the transformed collation > > elements (from the DUCET + CLDR) and the resulting sort keys. > > > > a =>[29,05,_05] => 29 , 05 , 05 . > > a\u0300 => [29,05,_05][,8A,_05] => 29 , 45 8A , 06 . > > à => <same> > > A\u0300 => [29,05,u1C][,8A,_05] => 29 , 45 8A , DC 05 . > > À => <same> As you can see, Mark does not come to the same conclusion as you, and nor do I. Richard.

