Re: RE: Roundtripping in Unicode

2004-12-13 Thread Doug Ewell
Philippe VERDY wrote: > (In fact I also think that mapping invalid sequences to U+FFFD is also > an error, because U+FFFD is valid, and the presence of the encoding > error in the source is lost, and will not throw exceptions in further > processings of the remapped text, unless the application c

Re: RE: Roundtripping in Unicode

2004-12-13 Thread John Cowan
Doug Ewell scripsit: > "When faced with [an] ill-formed code unit sequence while transforming > or interpreting text, a conformant process must treat the first code > unit... as an illegally terminated code unit sequence -- for example, by > signaling an error, filtering the code unit out, or repr

RE: RE: RE: Roundtripping in Unicode

2004-12-13 Thread Lars Kristan
Title: RE: RE: RE: Roundtripping in Unicode Philippe VERDY wrote: > I don't think I miss the point. My suggested approach to > perform roundtrip conversions between UTF's while keeping all > invalid sequences as invalid (for the standard UTFs), is much > less risky t

Re: RE: Roundtripping in Unicode

2004-12-13 Thread Philippe VERDY
Lars Kristan wrote:> What I was talking about in the paragraph in question is what happens if you want to take unassigned codepoints and give them a new status. You don't need to do that. No Unicode application must assign semantics to unassigned codepoints. If a source sequence is invalid, and you

Re: RE: RE: Roundtripping in Unicode

2004-12-13 Thread Philippe VERDY
> From : "Lars Kristan" > Philippe VERDY wrote: > > If a source sequence is invalid, and you want to preserve it, > > then this sequence must remain invalid if you change its encoding. > > So there's no need for Unicode to assign valid code points > > for invalid source data. > Using invalid

RE: RE: Roundtripping in Unicode

2004-12-13 Thread Lars Kristan
Title: RE: RE: Roundtripping in Unicode Philippe VERDY wrote: > If a source sequence is invalid, and you want to preserve it, > then this sequence must remain invalid if you change its encoding. > So there's no need for Unicode to assign valid code points > for invalid s