Vadim,
> I have a problem with creating collation key for U+2047 (double question > mark). > > Explicit collation keys for this symbol is absent in allkeys.txt. allkeys.txt in the current version of the Unicode Collation Algorithm is based on the Unicode *3.1* repertoire. This can be seen in the references section in UTS #10, where the version is explicitly listed as allkeys-3.1.1.txt. U+2047 is a character added to Unicode Version *3.2*. > > In UnicodeData.txt this symbol have compatibility decomposition map. > > 2047: ... :<compat> 003F 003F: ... True. > > Based on this and as defined in UTR #10 Unicode Collation Algoriphm this > symbol must have these collation keys: > > 003F [*024E.0020.0004] > 003F [*024E.0020.0004] > > But in CollationTest_NON_IGNORABLE.txt assumes that symbol have implicit > collation key [FBC0.0020.0002] [A047.0000.0000]. CollationTest_NON_IGNORABLE.txt is also based on the Unicode 3.1 repertoire. For a Unicode 3.1 implementation of collation, U+2047 is a reserved code point. This situation, where the allkeys.txt table is slightly out-of-synch (behind) the ongoing repertoire additions to the Unicode Standard, is a known problem we are working on. The Unicode Technical Committee has mandated that the repertoire for the allkeys.txt table be updated directly to the Unicode 4.0 repertoire, as soon after the release of Unicode 4.0 as possible. We are trying to do this more or less simultaneously this time, but there may be a small delay, given the scope of the upcoming Unicode 4.0 release. In the meantime, if you need to deal with character additions for Unicode 3.2 for collation, then you need to handle them in terms of tailorings from the current allkeys.txt table. --Ken