David Hopwood and Carl Brown graciously corrected me:

>> I don't agree that irregular UTF-8 sequences in general can only decode to
>> characters above 0xFFFF.
>
>  That's why I specifically referred to irregular sequences as defined by
>  Unicode 3.1 (i.e. UAX #27).

I stand corrected.  That's what I get for not having a copy of UAX #27 handy.

Non-shortest sequences, of course, used to be considered irregular (not 
invalid) in Unicode 3.0, before the Technical Committee wisely tightened up 
the definition of UTF-8.

-Doug Ewell
 Fullerton, California

Reply via email to