On Tue, May 16, 2017 at 1:16 AM, Shawn Steele via Unicode
wrote:
> I’m not sure how the discussion of “which is better” relates to the
> discussion of ill-formed UTF-8 at all.
Clearly, the "which is better" issue is distracting from the
underlying issue. I'll clarify what I meant on that point an
On 05/15/2017 04:21 AM, Henri Sivonen via Unicode wrote:
In reference to:
http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
I think Unicode should not adopt the proposed change.
The proposal is to make ICU's spec violation conforming. I think there
is both a technical and a political re
Softwares designed with only UCS-2 and not real UTF-16 support are still
used today
For example MySQL with its broken "UTF-8" encoding which in fact encodes
supplementary characters as two separate 16-bit code-units for surrogates,
each one blindly encoded as 3-byte sequences which would be ill-fo
2017-05-15 19:54 GMT+02:00 Asmus Freytag via Unicode :
> I think this political reason should be taken very seriously. There are
> already too many instances where ICU can be seen "driving" the development
> of property and algorithms.
>
> Those involved in the ICU project may not see the problem,
On Mon, 15 May 2017 21:38:26 +
David Starner via Unicode wrote:
> > and the fact is that handling surrogates (which is what proponents
> > of UTF-8 or UCS-4 usually focus on) is no more complicated than
> > handling combining characters, which you have to do anyway.
> Not necessarily; you ca
I’m not sure how the discussion of “which is better” relates to the discussion
of ill-formed UTF-8 at all.
And to the last, saying “you cannot process UTF-16 without handling surrogates”
seems to me to be the equivalent of saying “you cannot process UTF-8 without
handling lead & trail bytes”.
On Mon, May 15, 2017 at 8:41 AM Alastair Houghton via Unicode <
unicode@unicode.org> wrote:
> Yes, UTF-8 is more efficient for primarily ASCII text, but that is not the
> case for other situations
UTF-8 is clearly more efficient space-wise that includes more ASCII
characters than characters betw
On 5/15/2017 11:33 AM, Henri Sivonen via Unicode wrote:
ICU uses UTF-16 as its in-memory Unicode representation, so ICU isn't
representative of implementation concerns of implementations that use
UTF-8 as their in-memory Unicode representation.
Even though there are notable systems (Win32, Jav
>> Disagree. An over-long UTF-8 sequence is clearly a single error. Emitting
>> multiple errors there makes no sense.
>
> Changing a specification as fundamental as this is something that should not
> be undertaken lightly.
IMO, the only think that can be agreed upon is that "something's bad
On Mon, May 15, 2017 at 6:37 PM, Alastair Houghton
wrote:
> On 15 May 2017, at 11:21, Henri Sivonen via Unicode
> wrote:
>>
>> In reference to:
>> http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
>>
>> I think Unicode should not adopt the proposed change.
>
> Disagree. An over-long UTF
On 15 May 2017, at 18:52, Asmus Freytag wrote:
>
> On 5/15/2017 8:37 AM, Alastair Houghton via Unicode wrote:
>> On 15 May 2017, at 11:21, Henri Sivonen via Unicode
>> wrote:
>>> In reference to:
>>> http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
>>>
>>> I think Unicode should not a
On 5/15/2017 3:21 AM, Henri Sivonen via Unicode wrote:
Second, the political reason:
Now that ICU is a Unicode Consortium project, I think the Unicode
Consortium should be particular sensitive to biases arising from being
both the source of the spec and the source of a popular
implementation. It
On 5/15/2017 8:37 AM, Alastair Houghton via Unicode wrote:
On 15 May 2017, at 11:21, Henri Sivonen via Unicode wrote:
In reference to:
http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
I think Unicode should not adopt the proposed change.
Disagree. An over-long UTF-8 sequence is clea
On Mon, 15 May 2017 16:14:23 +
Peter Constable via Unicode wrote:
> So, your helpful person was, indeed, helpful, giving you correct
> information: ZWJ sequences are not _characters_ and have no
> implications for ISO/IEC 10646.
Except in so far as the claimed ligature changes the meaning of
Emoji sequences are not _encoded_, per se, in either Unicode or ISO/IEC 10646.
The act of "encoding" in either of these coding standards is to assign an
encoded representation in the encoding method of the standards for a given
entity. In this case, that means to assign a code point.
Specifyin
On 15 May 2017, at 11:21, Henri Sivonen via Unicode wrote:
>
> In reference to:
> http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
>
> I think Unicode should not adopt the proposed change.
Disagree. An over-long UTF-8 sequence is clearly a single error. Emitting
multiple errors ther
I am concerned about emoji ZWJ sequences being encoded without going through
the ISO process and whether Unicode will therefore lose synchronization with
ISO/IEC 10646.
I have raised this by email and a very helpful person has advised me that
encoding emoji sequences does not mean that Unicode
In reference to:
http://www.unicode.org/L2/L2017/17168-utf-8-recommend.pdf
I think Unicode should not adopt the proposed change.
The proposal is to make ICU's spec violation conforming. I think there
is both a technical and a political reason why the proposal is a bad
idea.
First, the technical
18 matches
Mail list logo