Corrigendum #9

Karl Williamson Fri, 30 May 2014 11:29:53 -0700

I'm having a problem with this
http://www.unicode.org/versions/corrigendum9.html

Some people now think it means that noncharacters are really nodifferent from private-use characters, and should be treated verysimilarly if not identically.

It seems to me that they should be illegal in open interchange, orperhaps illegal in interchange without prior agreement.

Any system (process or group of related, cooperating processes) thatuses noncharacters will want to not have any of the ones it uses presentin its inputs. It will want to filter them out of those inputs, likelyturning each into a REPLACEMENT CHARACTER. If it fails to do that, itleaves itself vulnerable to an attack by hackers, who can fool it intothinking the input data is different from what it really is.

Hence, a system that creates outputs containing noncharacters cannot beassured that any other system will accept those noncharacters.

Thus, I don't see how noncharacters can be considered to be valid inpublic interchange, given that the producers have to assume that theconsumers will not accept them. Producers can assume that consumerswill accept private-use characters, though they may not know their intent.

I think the text in 6.2 section 16.7 is good and does not need to bechanged: "Noncharacters ... are forbidden for use in open interchange ofUnicode text data"

Perhaps a bit better wording would be, "are forbidden for use ininterchange of Unicode text data without prior agreement"

The only reason I can think of for your too-large (in my opinion)backing away from what TUS has said about noncharacters since theirinception is to accommodate processes that conform to C7, "that purportsto not modify the interpretation of a valid coded character sequence".But, I think there is a better way to do that than what Corrigendum #9currently says.

I also am curious as to why the consecutive group of 32 noncharacterscan't be split off into its own block instead of being part of an Arabicone. I'm unaware of any stability policy forbidding this. Anotherblock is to be split, if I recall correctly, to accommodate the newCherokee characters.

_______________________________________________
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Corrigendum #9

Reply via email to