On Mon, Jun 2, 2014 at 10:32 PM, David Starner <prosfil...@gmail.com> wrote:
> Why? It seems you're changing the rules > ... > > This isn't "are changing", it is "has changed". The Corrigendum was issued at the start of 2013, about 16 months ago; applicable to all relevant earlier versions. It was the result of fairly extensive debate inside the UTC; there hasn't been a single issue on this thread that wasn't considered during the discussions there. And as far back as 2001, the UTC made it clear that noncharacters *are* scalar values, and are to be converted by UTF converters. Eg, see http://www.unicode.org/mail-arch/unicode-ml/y2001-m09/0149.html (by chance, one day before 9/11). > probably trigger serious bugs in some lamebrained utility. There were already plenty of programs that passed the noncharacters through; very few would filter them (some would delete them, which is horrible for security). Thinking that a utility would never encounter them in input text was a pipe-dream. If a utility or library is so fragile that it *breaks* on input of any valid UTF sequence, then it *is* a "lamebrained" utility. A good unit test for any production chain would be to check there is no crash on any input scalar value (and for that matter, any ill-formed UTF text).
_______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode