Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Johan Tibell
On Tue, Aug 17, 2010 at 1:05 PM, Bulat Ziganshin wrote: > Hello Tako, > > Tuesday, August 17, 2010, 3:03:20 PM, you wrote: > > > Unless a Char in Haskell is 32 bits (or at least more than 16 bits) > > it con NOT encode all Unicode points. > > it's 32 bit > Like Bulat said it's 32 bit. It's *defin

Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Johan Tibell
On Tue, Aug 17, 2010 at 12:39 PM, Bulat Ziganshin wrote: > Hello Tom, > > Tuesday, August 17, 2010, 2:09:09 PM, you wrote: > > > In the first iteration of the Text package, UTF-16 was chosen because > > it had a nice balance of arithmetic overhead and space. The > > arithmetic for UTF-8 started

Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Bulat Ziganshin
Hello Tako, Tuesday, August 17, 2010, 3:03:20 PM, you wrote: > Unless a Char in Haskell is 32 bits (or at least more than 16 bits) > it con NOT encode all Unicode points. it's 32 bit -- Best regards, Bulatmailto:bulat.zigans...@gmail.com _

Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Tom Harper
2010/8/17 Bulat Ziganshin : > Hello Tom, > i don't understand what you mean. are you support all 2^20 codepoints > in Data.Text package? Bulat, Yes, its internal representation is UTF-16, which is capable of encoding *any* valid Unicode codepoint. -- Tom __

Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Bulat Ziganshin
Hello Tom, Tuesday, August 17, 2010, 2:09:09 PM, you wrote: > In the first iteration of the Text package, UTF-16 was chosen because > it had a nice balance of arithmetic overhead and space. The > arithmetic for UTF-8 started to have serious performance impacts in > situations where the entire do

Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Johan Tibell
Hi Bulat, On Tue, Aug 17, 2010 at 10:34 AM, Bulat Ziganshin wrote: > > It's not clear to me that using UTF-16 internally does make > > Data.Text noticeably slower. > > not slower but require 2x more memory. speed is the same since > Unicode contains 2^20 codepoints > Yes, in theory a program c

Re: Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Tako Schotanus
On Tue, Aug 17, 2010 at 10:34, Bulat Ziganshin wrote: > Hello Johan, > > Tuesday, August 17, 2010, 12:20:37 PM, you wrote: > > > I agree, Data.Text is great. Unfortunately, its internal use of UTF-16 > > makes it inefficient for many purposes. > > > It's not clear to me that using UTF-16 intern

Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-17 Thread Bulat Ziganshin
Hello Johan, Tuesday, August 17, 2010, 12:20:37 PM, you wrote: > I agree, Data.Text is great.  Unfortunately, its internal use of UTF-16 > makes it inefficient for many purposes. > It's not clear to me that using UTF-16 internally does make > Data.Text noticeably slower. not slower but requir

Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-15 Thread Bulat Ziganshin
Hello Daniel, Sunday, August 15, 2010, 10:39:24 PM, you wrote: > That's great. If that performance difference is a show stopper, one > shouldn't go higher-level than C anyway :) *all* speed measurements that find Haskell is as fast as C, was broken. Let's see: D:\testing>read MsOffice.arc MsOff

Re[2]: [Haskell-cafe] Re: String vs ByteString

2010-08-15 Thread Bulat Ziganshin
Hello Bryan, Sunday, August 15, 2010, 10:04:01 PM, you wrote: > shared on Friday, and boiled it down to a simple test case: how long does it > take to read a 31MB file? > GNU wc -m: there are even slower ways to do it if you need :) if your data aren't cached, then speed is limited by HDD. if