Vadim,

> I have a problem with creating collation key for U+2047 (double question 
> mark).
> 
> Explicit collation keys for this symbol is absent in allkeys.txt.

allkeys.txt in the current version of the Unicode Collation Algorithm
is based on the Unicode *3.1* repertoire. This can be seen in
the references section in UTS #10, where the version is explicitly
listed as allkeys-3.1.1.txt.

U+2047 is a character added to Unicode Version *3.2*.

> 
> In UnicodeData.txt this symbol have compatibility decomposition map.
> 
> 2047: ... :<compat> 003F 003F: ...

True.

> 
> Based on this and as defined in UTR #10 Unicode Collation Algoriphm this 
> symbol must have these collation keys:
> 
> 003F [*024E.0020.0004]
> 003F [*024E.0020.0004]
> 
> But in CollationTest_NON_IGNORABLE.txt assumes that symbol have implicit 
> collation key [FBC0.0020.0002] [A047.0000.0000].

CollationTest_NON_IGNORABLE.txt is also based on the Unicode 3.1
repertoire. For a Unicode 3.1 implementation of collation,
U+2047 is a reserved code point.

This situation, where the allkeys.txt table is slightly out-of-synch
(behind) the ongoing repertoire additions to the Unicode Standard,
is a known problem we are working on.

The Unicode Technical Committee has mandated that the repertoire
for the allkeys.txt table be updated directly to the Unicode 4.0
repertoire, as soon after the release of Unicode 4.0 as
possible. We are trying to do this more or less simultaneously
this time, but there may be a small delay, given the scope
of the upcoming Unicode 4.0 release.

In the meantime, if you need to deal with character additions
for Unicode 3.2 for collation, then you need to handle them
in terms of tailorings from the current allkeys.txt table.

--Ken


Reply via email to