Re: Corrigendum #9

Mark Davis ☕️ Mon, 02 Jun 2014 23:59:21 -0700

On Mon, Jun 2, 2014 at 10:32 PM, David Starner <prosfil...@gmail.com> wrote:


> Why? It seems you're changing the rules
> ...
>
>
This isn't "are changing", it is "has changed". The Corrigendum was issued
at the start of 2013, about 16 months ago; applicable to all relevant
earlier versions. It was the result of fairly extensive debate inside the
UTC; there hasn't been a single issue on this thread that wasn't considered
during the discussions there. And as far back as 2001, the UTC made it
clear that noncharacters *are* scalar values, and are to be converted by
UTF converters. Eg, see
http://www.unicode.org/mail-arch/unicode-ml/y2001-m09/0149.html (by chance,
one day before 9/11).

> probably trigger serious bugs in some lamebrained utility.

There were already plenty of programs that passed the noncharacters
through; very few would filter them (some would delete them, which is
horrible for security). Thinking that a utility would never encounter them
in input text was a pipe-dream. If a utility or library is so fragile that
it *breaks* on input of any valid UTF sequence, then it *is* a "lamebrained"
utility. A good unit test for any production chain would be to check there
is no crash on any input scalar value (and for that matter, any ill-formed
UTF text).

_______________________________________________
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode

Re: Corrigendum #9

Reply via email to