Simon, There have been two corrections to normalization since Unicode 3.0. One involved a Chinese (Han) compatibility character that was mapped to the wrong "normal" character by error. The other involved a Yiddish (Hebrew) compatibility character that should have had a compatibility mapping, but did not, also by error.
Both corrections were made to characters that are supposedly "very rare" in actual use, so that the real-world impact would be minimal. Neither one has anything to do with transcoding tables. I know you are very concerned that Unicode has "broken its promise" by making changes to the normalization tables after claiming they would not do so. I think if the corrections had not been made, there would have been an equal but opposite reaction that Unicode was too stubborn to correct its own mistakes, and that NFKC was rendered "useless" because of these two incorrect mappings. The pages explaining the corrigenda include lengthy, detailed explanations of why the Technical Committee felt they were necessary and justified. As someone already mentioned, one of the justifications given for the Yiddish change was that no normative references existed *yet* for the Unicode normalization tables (i.e. from IDN). This implies that once such normative references *do* exist, a similar decision to correct an error might not be made. I imagine these were very difficult decisions for the UTC, who knew that someone would jump on the changes immediately as evidence that normalization is inherently unstable and Unicode is therefore "not secure." It's true that we are relying on "enlightened statesmen" to make the right decisions and not, say, decide one day to add a compatibility mapping for U+00C6 LATIN CAPITAL LETTER AE that would break everything. The UTC has tried to assure us that such a thing will not happen, but in the end, all we can do is trust. -Doug Ewell Fullerton, California ----- Original Message ----- From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, June 02, 2002 5:44 am Subject: [idn] Re: :Re: Last Call: Preparation of Internationalized Strings [Resending with different From: address.] Patrik Fältström <[EMAIL PROTECTED]> writes: > --On 2002-05-30 12.16 +0200 Simon Josefsson <[EMAIL PROTECTED]> wrote: > >> This is interesting -- has the Unicode consortium promised to always >> update the CK normalization tables in a forward compatible way? > > Yes. The reference for that statement seem to be (correct me if I'm wrong) http://www.unicode.org/unicode/standard/policies.html: ,---- | Normalization. Once a character is encoded, its canonical combining | class and decomposition mapping will not be changed in a way that will | destabilize normalization. `---- Which looks good. However, reading on: ,---- | The decomposition mapping may only be changed at all in the following | _exceptional_ set of circumstances: | | + when there is a clear and evident mistake identified in the Unicode | Character Database (such as a typographic mistake), and | | + when that error constitutes a clear violation of the identity | stability policy (#4), and | | + when the correction of such an error does not violate constraints (a)-(d) `---- So it appears as if the statement isn't strictly true? A further security consideration of IDNA could be that whenever such modifications is done in the Unicode standards, they may be exploited and it should be an operational consideration to never register domains, issues PKIX certifices for domains, create Kerberos realms, create PGP keys, etc, for IDNs that contains characters that have their decomposition mapping changed by the Unicode consortium. It seems as if a modification of this kind occured between Unicode 3.0 and 3.1: http://www.unicode.org/versions/corrigendum2.html. The conclusion here is that this isn't a practical problem -- only one character changed normalization between 3.0 and 3.1 and none between 3.1 and 3.2 AFAIK. I am more worried about the transcoding problem.