On Wed, 2007-10-03 at 14:15 +0200, Stephane Bortzmeyer wrote:
> On Wed, Oct 03, 2007 at 12:01:50AM +0200,
> Twan van Laarhoven <[EMAIL PROTECTED]> wrote
> a message of 24 lines which said:
>
> > Lots of people wrote:
> > > I want a UTF-8 bikeshed!
> > > No, I want a UTF-16 bikeshed!
>
> Person
> > What the heck does it matter what encoding the library uses
> > internally?
>
> +1 It can even use a non-standard encoding scheme if it wants.
Sounds good to me. I (think) one of my initial questions was if the
encoding should be visible in the type of the UnicodeString type or
not. My gut fee
On Wed, Oct 03, 2007 at 12:01:50AM +0200,
Twan van Laarhoven <[EMAIL PROTECTED]> wrote
a message of 24 lines which said:
> Lots of people wrote:
> > I want a UTF-8 bikeshed!
> > No, I want a UTF-16 bikeshed!
Personnally, I want an UTF-32 bikeshed. UTF-16 is as lousy as UTF-8
(for both of them,
On Tue, 2007-10-02 at 21:45 -0400, Brandon S. Allbery KF8NH wrote:
> > Due to the additional complexity of handling UTF-8 -- EVEN IF the
> > actual text processed happens all to be US-ASCII -- will UTF-8
> > perhaps be less efficient than UTF-16, or only as fast?
> UTF8 will be very slightly
On Tue, 2007-10-02 at 14:32 -0700, Stefan O'Rear wrote:
> UTF-8 supports CJK languages too. The only question is efficiency, and
> I believe CJK is still a relatively uncommon case compared to English
> and other Latin-alphabet languages. (That said, I live in a country all
> of whose dominant l
On Oct 2, 2007, at 21:12 , Isaac Dupree wrote:
Stefan O'Rear wrote:
On Tue, Oct 02, 2007 at 11:05:38PM +0200, Johan Tibell wrote:
I do not believe that anyone was seriously advocating multiple
blessed
encodings. The main question is *which* encoding to bless. 99+
% of
text I encounter is
Stefan O'Rear wrote:
On Tue, Oct 02, 2007 at 11:05:38PM +0200, Johan Tibell wrote:
I do not believe that anyone was seriously advocating multiple blessed
encodings. The main question is *which* encoding to bless. 99+% of
text I encounter is in US-ASCII, so I would favor UTF-8. Why is UTF-16
b
On Wed, 2007-10-03 at 00:01 +0200, Twan van Laarhoven wrote:
> Lots of people wrote:
> > I want a UTF-8 bikeshed!
> > No, I want a UTF-16 bikeshed!
>
> What the heck does it matter what encoding the library uses internally?
+1
jcc
___
Haskell-Cafe
Lots of people wrote:
> I want a UTF-8 bikeshed!
> No, I want a UTF-16 bikeshed!
What the heck does it matter what encoding the library uses internally?
I expect the interface to be something like (from my own CompactString
library):
> fromByteString :: Encoding -> ByteString -> UnicodeString
On Oct 2, 2007, at 3:01 PM, Twan van Laarhoven wrote:
Lots of people wrote:
> I want a UTF-8 bikeshed!
> No, I want a UTF-16 bikeshed!
What the heck does it matter what encoding the library uses
internally? I expect the interface to be something like (from my own
CompactString library):
> f
On Oct 2, 2007, at 8:44 AM, Jonathan Cast wrote:
I would like to, again, strongly argue against sacrificing
compatibility
with Linux/BSD/etc. for the sake of compatibility with OS X or
Windows.
FFI bindings have to convert data formats in any case; Haskell
shouldn't
gratuitously break Linux
On Tue, Oct 02, 2007 at 11:05:38PM +0200, Johan Tibell wrote:
> > I do not believe that anyone was seriously advocating multiple blessed
> > encodings. The main question is *which* encoding to bless. 99+% of
> > text I encounter is in US-ASCII, so I would favor UTF-8. Why is UTF-16
> > better fo
> I do not believe that anyone was seriously advocating multiple blessed
> encodings. The main question is *which* encoding to bless. 99+% of
> text I encounter is in US-ASCII, so I would favor UTF-8. Why is UTF-16
> better for me?
All software I write professional have to support 40 languages
On Tue, Oct 02, 2007 at 08:02:30AM -0700, Deborah Goldsmith wrote:
> UTF-16 is the type used in all the APIs. Everything else is considered an
> encoding conversion.
>
> CoreFoundation uses UTF-16 internally except when the string fits entirely
> in a single-byte legacy encoding like MacRoman or
On Tue, 2007-10-02 at 22:05 +0400, Miguel Mitrofanov wrote:
> > I would like to, again, strongly argue against sacrificing
> > compatibility
> > with Linux/BSD/etc. for the sake of compatibility with OS X or
> > Windows.
>
> Ehm? I've used to think MacOS is a sort of BSD...
Cocoa, then.
jcc
I would like to, again, strongly argue against sacrificing
compatibility
with Linux/BSD/etc. for the sake of compatibility with OS X or
Windows.
Ehm? I've used to think MacOS is a sort of BSD...
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
On Tue, 2007-10-02 at 08:02 -0700, Deborah Goldsmith wrote:
> On Oct 2, 2007, at 5:11 AM, ChrisK wrote:
> > Deborah Goldsmith wrote:
> >
> >> UTF-16 is the native encoding used for Cocoa, Java, ICU, and
> >> Carbon, and
> >> is what appears in the APIs for all of them. UTF-16 is also what's
> >>
On Oct 2, 2007, at 5:11 AM, ChrisK wrote:
Deborah Goldsmith wrote:
UTF-16 is the native encoding used for Cocoa, Java, ICU, and
Carbon, and
is what appears in the APIs for all of them. UTF-16 is also what's
stored in the volume catalog on Mac disks. UTF-8 is only used in BSD
APIs for backward
Deborah Goldsmith wrote:
> UTF-16 is the native encoding used for Cocoa, Java, ICU, and Carbon, and
> is what appears in the APIs for all of them. UTF-16 is also what's
> stored in the volume catalog on Mac disks. UTF-8 is only used in BSD
> APIs for backward compatibility. It's also used in plain
Sorry for the long delay, work has been really busy...
On Sep 27, 2007, at 12:25 PM, Aaron Denney wrote:
On 2007-09-27, Aaron Denney <[EMAIL PROTECTED]> wrote:
Well, not so much. As Duncan mentioned, it's a matter of what the
most
common case is. UTF-16 is effectively fixed-width for the major
On 2007-09-27, Duncan Coutts <[EMAIL PROTECTED]> wrote:
> In message <[EMAIL PROTECTED]> [EMAIL PROTECTED] writes:
>> On 2007-09-27, Deborah Goldsmith <[EMAIL PROTECTED]> wrote:
>> > On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
>> >>> UTF-16 has no advantage over UTF-8 in this respect, because
On 2007-09-27, Aaron Denney <[EMAIL PROTECTED]> wrote:
> On 2007-09-27, Deborah Goldsmith <[EMAIL PROTECTED]> wrote:
>> On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
UTF-16 has no advantage over UTF-8 in this respect, because of
surrogate
pairs and combining characters.
>>>
>>>
In message <[EMAIL PROTECTED]> Tony Finch
<[EMAIL PROTECTED]> writes:
> On Thu, 27 Sep 2007, Ross Paterson wrote:
> >
> > Combining characters are not an issue here, just the surrogate pairs,
> > because we're discussing representations of sequences of Chars (Unicode
> > code points).
>
> I dislik
On Thu, 27 Sep 2007, Ross Paterson wrote:
>
> Combining characters are not an issue here, just the surrogate pairs,
> because we're discussing representations of sequences of Chars (Unicode
> code points).
I dislike referring to unicode code points as "characters" because that
tends to imply a lot
> Well, if you never heard anyone complaining about [Char] and never had
> any problem with it's slowness, you're probably not in a field where
> the efficiency of a Unicode library is really a concern, that's for
> sure. (I know that the _main_ problem with [Char] wasn't random
> access, but you m
2007/9/27, Duncan Coutts <[EMAIL PROTECTED]>:
> > Infrequent, but they exist, which means you can't seek x/2 bytes ahead
> > to seek x characters ahead. All such seeking must be linear for both
> > UTF-16 *and* UTF-8.
>
> And in [Char] for all these years, yet I don't hear people complaining. Most
In message <[EMAIL PROTECTED]> [EMAIL PROTECTED] writes:
> On 2007-09-27, Deborah Goldsmith <[EMAIL PROTECTED]> wrote:
> > On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
> >>> UTF-16 has no advantage over UTF-8 in this respect, because of
> >>> surrogate
> >>> pairs and combining characters.
>
On 2007-09-27, Ross Paterson <[EMAIL PROTECTED]> wrote:
> On Thu, Sep 27, 2007 at 07:26:07AM +, Aaron Denney wrote:
>> On 2007-09-27, Ross Paterson <[EMAIL PROTECTED]> wrote:
>> > Combining characters are not an issue here, just the surrogate pairs,
>> > because we're discussing representations
On Thu, Sep 27, 2007 at 06:39:24AM +, Aaron Denney wrote:
> On 2007-09-27, Deborah Goldsmith <[EMAIL PROTECTED]> wrote:
> > Well, not so much. As Duncan mentioned, it's a matter of what the most
> > common case is. UTF-16 is effectively fixed-width for the majority of
> > text in the majori
On Thu, Sep 27, 2007 at 07:26:07AM +, Aaron Denney wrote:
> On 2007-09-27, Ross Paterson <[EMAIL PROTECTED]> wrote:
> > Combining characters are not an issue here, just the surrogate pairs,
> > because we're discussing representations of sequences of Chars (Unicode
> > code points).
>
> You'll
On 2007-09-27, Ross Paterson <[EMAIL PROTECTED]> wrote:
> Combining characters are not an issue here, just the surrogate pairs,
> because we're discussing representations of sequences of Chars (Unicode
> code points).
You'll never want to combine combining characters or vice-versa? Never
want to
On Wed, Sep 26, 2007 at 11:25:30AM +0100, Tony Finch wrote:
> On Wed, 26 Sep 2007, Aaron Denney wrote:
> > It's true that time-wise there are definite issues in finding character
> > boundaries.
>
> UTF-16 has no advantage over UTF-8 in this respect, because of surrogate
> pairs and combining char
On 2007-09-27, Deborah Goldsmith <[EMAIL PROTECTED]> wrote:
> On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
>>> UTF-16 has no advantage over UTF-8 in this respect, because of
>>> surrogate
>>> pairs and combining characters.
>>
>> Good point.
>
> Well, not so much. As Duncan mentioned, it's a
On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
UTF-16 has no advantage over UTF-8 in this respect, because of
surrogate
pairs and combining characters.
Good point.
Well, not so much. As Duncan mentioned, it's a matter of what the most
common case is. UTF-16 is effectively fixed-width f
On 2007-09-26, Tony Finch <[EMAIL PROTECTED]> wrote:
> On Wed, 26 Sep 2007, Aaron Denney wrote:
>>
>> It's true that time-wise there are definite issues in finding character
>> boundaries.
>
> UTF-16 has no advantage over UTF-8 in this respect, because of surrogate
> pairs and combining characters.
On 2007-09-26, Johan Tibell <[EMAIL PROTECTED]> wrote:
> On 9/26/07, Aaron Denney <[EMAIL PROTECTED]> wrote:
>> On 2007-09-26, Johan Tibell <[EMAIL PROTECTED]> wrote:
>> > If UTF-16 is what's used by everyone else (how about Java? Python?) I
>> > think that's a strong reason to use it. I don't know
On Wed, 26 Sep 2007, Aaron Denney wrote:
>
> It's true that time-wise there are definite issues in finding character
> boundaries.
UTF-16 has no advantage over UTF-8 in this respect, because of surrogate
pairs and combining characters. Code points, characters, and glyphs are
all different things,
On 9/26/07, Aaron Denney <[EMAIL PROTECTED]> wrote:
> On 2007-09-26, Johan Tibell <[EMAIL PROTECTED]> wrote:
> > If UTF-16 is what's used by everyone else (how about Java? Python?) I
> > think that's a strong reason to use it. I don't know Unicode well
> > enough to say otherwise.
>
> The internal
On 2007-09-26, Johan Tibell <[EMAIL PROTECTED]> wrote:
> If UTF-16 is what's used by everyone else (how about Java? Python?) I
> think that's a strong reason to use it. I don't know Unicode well
> enough to say otherwise.
The internal representations don't matter except in the case of making
FFI l
On 2007-09-26, Deborah Goldsmith <[EMAIL PROTECTED]> wrote:
> From an implementation point of view, UTF-16 is the most efficient
> representation for processing Unicode.
This depends on the characteristics of the text being processed.
Spacewise, English stays 1 byte/char in UTF-8. Most Europea
40 matches
Mail list logo