Re: implicit weight base for U+2CEA2

Ken Whistler via Unicode Wed, 27 Sep 2017 14:32:00 -0700


On 9/27/2017 2:19 PM, Markus Scherer via Unicode wrote:

On Wed, Sep 27, 2017 at 1:49 PM, James Tauber via Unicode<[email protected] <mailto:[email protected]>> wrote:


    I recently updated pyuca[1], my pure Python implementation of the
    Unicode Collation Algorithm to work with 8.0.0, 9.0.0, and 10.0.0
    but to get all the tests to work, I had to special case the
    implicit weight base for U+2CEA2. The spec seems to suggest the
    base should be FB80 but I had to override just that code point to
    have a base of FBC0 for the tests to pass.

    Is this a known issue with the spec or something I've missed?

2CEA2..2CEAF are unassigned code points for which the UCA+DUCET uses abase of FBC0.


markus

And you may have a range error in Extension E to account for the testproblem.

The relevant section of CollationTest_SHIFTED_SHORT.txt has tests thatwill pass only if:


2B735 < 2B81E < 2CEA2 < 2EBE1 < 2FFFE
Ext C< Ext D < Ext E < Ext F < non-character

Those are *unassigned* characters just past the assigned ranges butstill in the blocks in each of those CJK extensions. So if you have arange error for assigned characters in Extension E, you'd get a failureat that point in the text cases.


--Ken

Re: implicit weight base for U+2CEA2

Reply via email to