Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Alastair Houghton via Unicode Wed, 17 May 2017 01:13:42 -0700

> On 16 May 2017, at 20:43, Richard Wordingham via Unicode 
> <[email protected]> wrote:
> 
> On Tue, 16 May 2017 11:36:39 -0700
> Markus Scherer via Unicode <[email protected]> wrote:
> 
>> Why do we care how we carve up an illegal sequence into subsequences?
>> Only for debugging and visual inspection. Maybe some process is using
>> illegal, overlong sequences to encode something special (à la Java
>> string serialization, "modified UTF-8"), and for that it might be
>> convenient too to treat overlong sequences as single errors.
> 
> I think that's not quite true.  If we are moving back and forth through
> a buffer containing corrupt text, we need to make sure that moving three
> characters forward and then three characters back leaves us where we
> started.  That requires internal consistency.


That’s very true.  But the proposed change doesn’t actually affect that; it’s 
still the case that you can correctly identify boundaries in both directions.

Kind regards,

Alastair.

--
http://alastairs-place.net

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Reply via email to