> On 16 May 2017, at 20:43, Richard Wordingham via Unicode > <[email protected]> wrote: > > On Tue, 16 May 2017 11:36:39 -0700 > Markus Scherer via Unicode <[email protected]> wrote: > >> Why do we care how we carve up an illegal sequence into subsequences? >> Only for debugging and visual inspection. Maybe some process is using >> illegal, overlong sequences to encode something special (à la Java >> string serialization, "modified UTF-8"), and for that it might be >> convenient too to treat overlong sequences as single errors. > > I think that's not quite true. If we are moving back and forth through > a buffer containing corrupt text, we need to make sure that moving three > characters forward and then three characters back leaves us where we > started. That requires internal consistency.
That’s very true. But the proposed change doesn’t actually affect that; it’s still the case that you can correctly identify boundaries in both directions. Kind regards, Alastair. -- http://alastairs-place.net

