-----BEGIN PGP SIGNED MESSAGE-----

Clauses D20 and D21 of the Unicode Standard (3.0 or 3.1) read:

# D20 Compatibility decomposition: the decomposition of a character that
#     results from recursively applying /both/ the compatibility /and/ the
#     canonical mappings found in the names list of /Section 14.1,
#     Character Names List/, and those described in /Section 3.11,
#     Conjoining Jamo Behavior/, until no characters can be further
#     decomposed, and then reordering nonspacing marks according to
#     /Section 3.10, Canonical Ordering Behavior/.
#
#   - A compatibility decomposition may remove formatting information.
#
# D21 Compatibility character: a character that has a compatibility
#     decomposition.
#
#   - Compatibility characters are included in the Unicode Standard to
#     represent distinctions in other base standards. They support
#     transmission and processing of legacy data. Their use is discouraged
#     other than for legacy data.
#   - Replacing a compatibility character by its decomposiiton may lose
#     round-trip convertibility with a base standard.

By definition D20, if a character has a canonical decomposition, then
it also has a compatibility decomposition. This is correct, because
NFKD includes all the decompositions that NFD does.
The problem is with D21: if all characters that have a canonical
decomposition also have a compatibility decomposition, then all of
these are compatibility characters. Clearly that wasn't what was
intended, and it is inconsistent with the following two bullet points.

I think the correct definition of a compatibility character is a
character with a compatibility decomposition that differs from its
canonical decomposition (i.e. NFKC(c) != NFC(c)). Am I right?


(Note that it wouldn't be correct to define a compatibility character
simply as a character that has "<...> ..." entry in the decomposition
field of the UCD; a counterexample is U+03D3.)

- -- 
David Hopwood <[EMAIL PROTECTED]>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO9kOszkCAxeYt5gVAQEvfAgAhPW+uauuxRArxCWPJgYBW54AvAdg3yxB
iATHjKED/4s+KkfMGP6kq3RzZpgD21MpeOacIG4+NWkgd8wHMRAvNWc2n+PEU+KJ
A3Ngf/vDV+JZxhDX09s6lSxagfkQDhxB/bzgGMzpyCUdJshgiBsnTd4C8/IXbzgR
KNi9XeZ+jEGYV+24S9stnMClmV/xMI9FR2QV2mA72Li5AgFR/DoRxSaeV4XiMw+3
RTJP5gVSQeUv1TsXD4X8J3z0YzxiFFzwPlIbG3o1BOcwjPrROmV0ULJQM1ufemGi
Q/VJrkvPPyxibcOAk8Vb6LtA+jyyoi9TAod3JcLWDsEiIq1bfbcBKw==
=tQg1
-----END PGP SIGNATURE-----

Reply via email to