date:20170515

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Henri Sivonen via Unicode

On Tue, May 16, 2017 at 1:16 AM, Shawn Steele via Unicode wrote: > I’m not sure how the discussion of “which is better” relates to the > discussion of ill-formed UTF-8 at all. Clearly, the "which is better" issue is distracting from the underlying issue. I'll clarify what I meant on that point an

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Karl Williamson via Unicode

On 05/15/2017 04:21 AM, Henri Sivonen via Unicode wrote: In reference to: http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf I think Unicode should not adopt the proposed change. The proposal is to make ICU's spec violation conforming. I think there is both a technical and a political re

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Philippe Verdy via Unicode

Softwares designed with only UCS-2 and not real UTF-16 support are still used today For example MySQL with its broken "UTF-8" encoding which in fact encodes supplementary characters as two separate 16-bit code-units for surrogates, each one blindly encoded as 3-byte sequences which would be ill-fo

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Philippe Verdy via Unicode

2017-05-15 19:54 GMT+02:00 Asmus Freytag via Unicode : > I think this political reason should be taken very seriously. There are > already too many instances where ICU can be seen "driving" the development > of property and algorithms. > > Those involved in the ICU project may not see the problem,

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Richard Wordingham via Unicode

On Mon, 15 May 2017 21:38:26 + David Starner via Unicode wrote: > > and the fact is that handling surrogates (which is what proponents > > of UTF-8 or UCS-4 usually focus on) is no more complicated than > > handling combining characters, which you have to do anyway. > Not necessarily; you ca

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Shawn Steele via Unicode

I’m not sure how the discussion of “which is better” relates to the discussion of ill-formed UTF-8 at all. And to the last, saying “you cannot process UTF-16 without handling surrogates” seems to me to be the equivalent of saying “you cannot process UTF-8 without handling lead & trail bytes”.

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread David Starner via Unicode

On Mon, May 15, 2017 at 8:41 AM Alastair Houghton via Unicode < unicode@unicode.org> wrote: > Yes, UTF-8 is more efficient for primarily ASCII text, but that is not the > case for other situations UTF-8 is clearly more efficient space-wise that includes more ASCII characters than characters betw

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Asmus Freytag via Unicode

On 5/15/2017 11:33 AM, Henri Sivonen via Unicode wrote: ICU uses UTF-16 as its in-memory Unicode representation, so ICU isn't representative of implementation concerns of implementations that use UTF-8 as their in-memory Unicode representation. Even though there are notable systems (Win32, Jav

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Shawn Steele via Unicode

>> Disagree. An over-long UTF-8 sequence is clearly a single error. Emitting >> multiple errors there makes no sense. > > Changing a specification as fundamental as this is something that should not > be undertaken lightly. IMO, the only think that can be agreed upon is that "something's bad

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Henri Sivonen via Unicode

On Mon, May 15, 2017 at 6:37 PM, Alastair Houghton wrote: > On 15 May 2017, at 11:21, Henri Sivonen via Unicode > wrote: >> >> In reference to: >> http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf >> >> I think Unicode should not adopt the proposed change. > > Disagree. An over-long UTF

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Alastair Houghton via Unicode

On 15 May 2017, at 18:52, Asmus Freytag wrote: > > On 5/15/2017 8:37 AM, Alastair Houghton via Unicode wrote: >> On 15 May 2017, at 11:21, Henri Sivonen via Unicode >> wrote: >>> In reference to: >>> http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf >>> >>> I think Unicode should not a

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Asmus Freytag via Unicode

On 5/15/2017 3:21 AM, Henri Sivonen via Unicode wrote: Second, the political reason: Now that ICU is a Unicode Consortium project, I think the Unicode Consortium should be particular sensitive to biases arising from being both the source of the spec and the source of a popular implementation. It

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Asmus Freytag via Unicode

On 5/15/2017 8:37 AM, Alastair Houghton via Unicode wrote: On 15 May 2017, at 11:21, Henri Sivonen via Unicode wrote: In reference to: http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf I think Unicode should not adopt the proposed change. Disagree. An over-long UTF-8 sequence is clea

Re: Are Emoji ZWJ sequences characters?

2017-05-15 Thread Richard Wordingham via Unicode

On Mon, 15 May 2017 16:14:23 + Peter Constable via Unicode wrote: > So, your helpful person was, indeed, helpful, giving you correct > information: ZWJ sequences are not _characters_ and have no > implications for ISO/IEC 10646. Except in so far as the claimed ligature changes the meaning of

RE: Are Emoji ZWJ sequences characters?

2017-05-15 Thread Peter Constable via Unicode

Emoji sequences are not _encoded_, per se, in either Unicode or ISO/IEC 10646. The act of "encoding" in either of these coding standards is to assign an encoded representation in the encoding method of the standards for a given entity. In this case, that means to assign a code point. Specifyin

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Alastair Houghton via Unicode

On 15 May 2017, at 11:21, Henri Sivonen via Unicode wrote: > > In reference to: > http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf > > I think Unicode should not adopt the proposed change. Disagree. An over-long UTF-8 sequence is clearly a single error. Emitting multiple errors ther

Are Emoji ZWJ sequences characters?

2017-05-15 Thread William_J_G Overington via Unicode

I am concerned about emoji ZWJ sequences being encoded without going through the ISO process and whether Unicode will therefore lose synchronization with ISO/IEC 10646. I have raised this by email and a very helpful person has advised me that encoding emoji sequences does not mean that Unicode

Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-15 Thread Henri Sivonen via Unicode

In reference to: http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf I think Unicode should not adopt the proposed change. The proposal is to make ICU's spec violation conforming. I think there is both a technical and a political reason why the proposal is a bad idea. First, the technical

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Re: Are Emoji ZWJ sequences characters?

RE: Are Emoji ZWJ sequences characters?

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Are Emoji ZWJ sequences characters?

Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

18 matches

Site Navigation

Mail list logo

Footer information