Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-03 Thread Jonathan Cast
On Wed, 2007-10-03 at 14:15 +0200, Stephane Bortzmeyer wrote: > On Wed, Oct 03, 2007 at 12:01:50AM +0200, > Twan van Laarhoven <[EMAIL PROTECTED]> wrote > a message of 24 lines which said: > > > Lots of people wrote: > > > I want a UTF-8 bikeshed! > > > No, I want a UTF-16 bikeshed! > > Person

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-03 Thread Johan Tibell
> > What the heck does it matter what encoding the library uses > > internally? > > +1 It can even use a non-standard encoding scheme if it wants. Sounds good to me. I (think) one of my initial questions was if the encoding should be visible in the type of the UnicodeString type or not. My gut fee

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-03 Thread Stephane Bortzmeyer
On Wed, Oct 03, 2007 at 12:01:50AM +0200, Twan van Laarhoven <[EMAIL PROTECTED]> wrote a message of 24 lines which said: > Lots of people wrote: > > I want a UTF-8 bikeshed! > > No, I want a UTF-16 bikeshed! Personnally, I want an UTF-32 bikeshed. UTF-16 is as lousy as UTF-8 (for both of them,

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Ketil Malde
On Tue, 2007-10-02 at 21:45 -0400, Brandon S. Allbery KF8NH wrote: > > Due to the additional complexity of handling UTF-8 -- EVEN IF the > > actual text processed happens all to be US-ASCII -- will UTF-8 > > perhaps be less efficient than UTF-16, or only as fast? > UTF8 will be very slightly

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Ketil Malde
On Tue, 2007-10-02 at 14:32 -0700, Stefan O'Rear wrote: > UTF-8 supports CJK languages too. The only question is efficiency, and > I believe CJK is still a relatively uncommon case compared to English > and other Latin-alphabet languages. (That said, I live in a country all > of whose dominant l

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Brandon S. Allbery KF8NH
On Oct 2, 2007, at 21:12 , Isaac Dupree wrote: Stefan O'Rear wrote: On Tue, Oct 02, 2007 at 11:05:38PM +0200, Johan Tibell wrote: I do not believe that anyone was seriously advocating multiple blessed encodings. The main question is *which* encoding to bless. 99+ % of text I encounter is

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Isaac Dupree
Stefan O'Rear wrote: On Tue, Oct 02, 2007 at 11:05:38PM +0200, Johan Tibell wrote: I do not believe that anyone was seriously advocating multiple blessed encodings. The main question is *which* encoding to bless. 99+% of text I encounter is in US-ASCII, so I would favor UTF-8. Why is UTF-16 b

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Jonathan Cast
On Wed, 2007-10-03 at 00:01 +0200, Twan van Laarhoven wrote: > Lots of people wrote: > > I want a UTF-8 bikeshed! > > No, I want a UTF-16 bikeshed! > > What the heck does it matter what encoding the library uses internally? +1 jcc ___ Haskell-Cafe

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Twan van Laarhoven
Lots of people wrote: > I want a UTF-8 bikeshed! > No, I want a UTF-16 bikeshed! What the heck does it matter what encoding the library uses internally? I expect the interface to be something like (from my own CompactString library): > fromByteString :: Encoding -> ByteString -> UnicodeString

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Deborah Goldsmith
On Oct 2, 2007, at 3:01 PM, Twan van Laarhoven wrote: Lots of people wrote: > I want a UTF-8 bikeshed! > No, I want a UTF-16 bikeshed! What the heck does it matter what encoding the library uses internally? I expect the interface to be something like (from my own CompactString library): > f

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Deborah Goldsmith
On Oct 2, 2007, at 8:44 AM, Jonathan Cast wrote: I would like to, again, strongly argue against sacrificing compatibility with Linux/BSD/etc. for the sake of compatibility with OS X or Windows. FFI bindings have to convert data formats in any case; Haskell shouldn't gratuitously break Linux

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Stefan O'Rear
On Tue, Oct 02, 2007 at 11:05:38PM +0200, Johan Tibell wrote: > > I do not believe that anyone was seriously advocating multiple blessed > > encodings. The main question is *which* encoding to bless. 99+% of > > text I encounter is in US-ASCII, so I would favor UTF-8. Why is UTF-16 > > better fo

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Johan Tibell
> I do not believe that anyone was seriously advocating multiple blessed > encodings. The main question is *which* encoding to bless. 99+% of > text I encounter is in US-ASCII, so I would favor UTF-8. Why is UTF-16 > better for me? All software I write professional have to support 40 languages

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Stefan O'Rear
On Tue, Oct 02, 2007 at 08:02:30AM -0700, Deborah Goldsmith wrote: > UTF-16 is the type used in all the APIs. Everything else is considered an > encoding conversion. > > CoreFoundation uses UTF-16 internally except when the string fits entirely > in a single-byte legacy encoding like MacRoman or

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Jonathan Cast
On Tue, 2007-10-02 at 22:05 +0400, Miguel Mitrofanov wrote: > > I would like to, again, strongly argue against sacrificing > > compatibility > > with Linux/BSD/etc. for the sake of compatibility with OS X or > > Windows. > > Ehm? I've used to think MacOS is a sort of BSD... Cocoa, then. jcc

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Miguel Mitrofanov
I would like to, again, strongly argue against sacrificing compatibility with Linux/BSD/etc. for the sake of compatibility with OS X or Windows. Ehm? I've used to think MacOS is a sort of BSD... ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Jonathan Cast
On Tue, 2007-10-02 at 08:02 -0700, Deborah Goldsmith wrote: > On Oct 2, 2007, at 5:11 AM, ChrisK wrote: > > Deborah Goldsmith wrote: > > > >> UTF-16 is the native encoding used for Cocoa, Java, ICU, and > >> Carbon, and > >> is what appears in the APIs for all of them. UTF-16 is also what's > >>

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Deborah Goldsmith
On Oct 2, 2007, at 5:11 AM, ChrisK wrote: Deborah Goldsmith wrote: UTF-16 is the native encoding used for Cocoa, Java, ICU, and Carbon, and is what appears in the APIs for all of them. UTF-16 is also what's stored in the volume catalog on Mac disks. UTF-8 is only used in BSD APIs for backward

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread ChrisK
Deborah Goldsmith wrote: > UTF-16 is the native encoding used for Cocoa, Java, ICU, and Carbon, and > is what appears in the APIs for all of them. UTF-16 is also what's > stored in the volume catalog on Mac disks. UTF-8 is only used in BSD > APIs for backward compatibility. It's also used in plain

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-01 Thread Deborah Goldsmith
Sorry for the long delay, work has been really busy... On Sep 27, 2007, at 12:25 PM, Aaron Denney wrote: On 2007-09-27, Aaron Denney <[EMAIL PROTECTED]> wrote: Well, not so much. As Duncan mentioned, it's a matter of what the most common case is. UTF-16 is effectively fixed-width for the major

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Aaron Denney
On 2007-09-27, Duncan Coutts <[EMAIL PROTECTED]> wrote: > In message <[EMAIL PROTECTED]> [EMAIL PROTECTED] writes: >> On 2007-09-27, Deborah Goldsmith <[EMAIL PROTECTED]> wrote: >> > On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote: >> >>> UTF-16 has no advantage over UTF-8 in this respect, because

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Aaron Denney
On 2007-09-27, Aaron Denney <[EMAIL PROTECTED]> wrote: > On 2007-09-27, Deborah Goldsmith <[EMAIL PROTECTED]> wrote: >> On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote: UTF-16 has no advantage over UTF-8 in this respect, because of surrogate pairs and combining characters. >>> >>>

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Duncan Coutts
In message <[EMAIL PROTECTED]> Tony Finch <[EMAIL PROTECTED]> writes: > On Thu, 27 Sep 2007, Ross Paterson wrote: > > > > Combining characters are not an issue here, just the surrogate pairs, > > because we're discussing representations of sequences of Chars (Unicode > > code points). > > I dislik

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Tony Finch
On Thu, 27 Sep 2007, Ross Paterson wrote: > > Combining characters are not an issue here, just the surrogate pairs, > because we're discussing representations of sequences of Chars (Unicode > code points). I dislike referring to unicode code points as "characters" because that tends to imply a lot

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Johan Tibell
> Well, if you never heard anyone complaining about [Char] and never had > any problem with it's slowness, you're probably not in a field where > the efficiency of a Unicode library is really a concern, that's for > sure. (I know that the _main_ problem with [Char] wasn't random > access, but you m

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Chaddaï Fouché
2007/9/27, Duncan Coutts <[EMAIL PROTECTED]>: > > Infrequent, but they exist, which means you can't seek x/2 bytes ahead > > to seek x characters ahead. All such seeking must be linear for both > > UTF-16 *and* UTF-8. > > And in [Char] for all these years, yet I don't hear people complaining. Most

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Duncan Coutts
In message <[EMAIL PROTECTED]> [EMAIL PROTECTED] writes: > On 2007-09-27, Deborah Goldsmith <[EMAIL PROTECTED]> wrote: > > On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote: > >>> UTF-16 has no advantage over UTF-8 in this respect, because of > >>> surrogate > >>> pairs and combining characters. >

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Aaron Denney
On 2007-09-27, Ross Paterson <[EMAIL PROTECTED]> wrote: > On Thu, Sep 27, 2007 at 07:26:07AM +, Aaron Denney wrote: >> On 2007-09-27, Ross Paterson <[EMAIL PROTECTED]> wrote: >> > Combining characters are not an issue here, just the surrogate pairs, >> > because we're discussing representations

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Ross Paterson
On Thu, Sep 27, 2007 at 06:39:24AM +, Aaron Denney wrote: > On 2007-09-27, Deborah Goldsmith <[EMAIL PROTECTED]> wrote: > > Well, not so much. As Duncan mentioned, it's a matter of what the most > > common case is. UTF-16 is effectively fixed-width for the majority of > > text in the majori

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Ross Paterson
On Thu, Sep 27, 2007 at 07:26:07AM +, Aaron Denney wrote: > On 2007-09-27, Ross Paterson <[EMAIL PROTECTED]> wrote: > > Combining characters are not an issue here, just the surrogate pairs, > > because we're discussing representations of sequences of Chars (Unicode > > code points). > > You'll

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-27 Thread Aaron Denney
On 2007-09-27, Ross Paterson <[EMAIL PROTECTED]> wrote: > Combining characters are not an issue here, just the surrogate pairs, > because we're discussing representations of sequences of Chars (Unicode > code points). You'll never want to combine combining characters or vice-versa? Never want to

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-26 Thread Ross Paterson
On Wed, Sep 26, 2007 at 11:25:30AM +0100, Tony Finch wrote: > On Wed, 26 Sep 2007, Aaron Denney wrote: > > It's true that time-wise there are definite issues in finding character > > boundaries. > > UTF-16 has no advantage over UTF-8 in this respect, because of surrogate > pairs and combining char

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-26 Thread Aaron Denney
On 2007-09-27, Deborah Goldsmith <[EMAIL PROTECTED]> wrote: > On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote: >>> UTF-16 has no advantage over UTF-8 in this respect, because of >>> surrogate >>> pairs and combining characters. >> >> Good point. > > Well, not so much. As Duncan mentioned, it's a

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-26 Thread Deborah Goldsmith
On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote: UTF-16 has no advantage over UTF-8 in this respect, because of surrogate pairs and combining characters. Good point. Well, not so much. As Duncan mentioned, it's a matter of what the most common case is. UTF-16 is effectively fixed-width f

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-26 Thread Aaron Denney
On 2007-09-26, Tony Finch <[EMAIL PROTECTED]> wrote: > On Wed, 26 Sep 2007, Aaron Denney wrote: >> >> It's true that time-wise there are definite issues in finding character >> boundaries. > > UTF-16 has no advantage over UTF-8 in this respect, because of surrogate > pairs and combining characters.

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-26 Thread Aaron Denney
On 2007-09-26, Johan Tibell <[EMAIL PROTECTED]> wrote: > On 9/26/07, Aaron Denney <[EMAIL PROTECTED]> wrote: >> On 2007-09-26, Johan Tibell <[EMAIL PROTECTED]> wrote: >> > If UTF-16 is what's used by everyone else (how about Java? Python?) I >> > think that's a strong reason to use it. I don't know

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-26 Thread Tony Finch
On Wed, 26 Sep 2007, Aaron Denney wrote: > > It's true that time-wise there are definite issues in finding character > boundaries. UTF-16 has no advantage over UTF-8 in this respect, because of surrogate pairs and combining characters. Code points, characters, and glyphs are all different things,

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-26 Thread Johan Tibell
On 9/26/07, Aaron Denney <[EMAIL PROTECTED]> wrote: > On 2007-09-26, Johan Tibell <[EMAIL PROTECTED]> wrote: > > If UTF-16 is what's used by everyone else (how about Java? Python?) I > > think that's a strong reason to use it. I don't know Unicode well > > enough to say otherwise. > > The internal

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-26 Thread Aaron Denney
On 2007-09-26, Johan Tibell <[EMAIL PROTECTED]> wrote: > If UTF-16 is what's used by everyone else (how about Java? Python?) I > think that's a strong reason to use it. I don't know Unicode well > enough to say otherwise. The internal representations don't matter except in the case of making FFI l

[Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-25 Thread Aaron Denney
On 2007-09-26, Deborah Goldsmith <[EMAIL PROTECTED]> wrote: > From an implementation point of view, UTF-16 is the most efficient > representation for processing Unicode. This depends on the characteristics of the text being processed. Spacewise, English stays 1 byte/char in UTF-8. Most Europea