On Wed, Aug 18, 2010 at 11:58 PM, Johan Tibell wrote:
> As for blaze I'm not sure exactly how it deals with UTF-8 input. I tried to
> browse through the repo but could find that input ByteStrings are actually
> validated anywhere. If they're not it's a big generous to say that it deals
> with UTF-
On Wed, Aug 18, 2010 at 7:12 PM, Michael Snoyman wrote:
> On Wed, Aug 18, 2010 at 6:24 PM, Johan Tibell wrote:
>
>>
>>
> Sorry, I thought I'd sent these out. While working on optimizing Hamlet I
> started playing around with the BigTable benchmark. I wrote two blog posts
> on the topic:
>
> http:/
On Wed, Aug 18, 2010 at 10:12 AM, Michael Snoyman wrote:
> While working on optimizing Hamlet I started playing around with the
> BigTable benchmark. I wrote two blog posts on the topic:
>
> http://www.snoyman.com/blog/entry/bigtable-benchmarks/
> http://www.snoyman.com/blog/entry/optimizing-ham
On Wed, Aug 18, 2010 at 6:24 PM, Johan Tibell wrote:
> Hi Michael,
>
>
> On Wed, Aug 18, 2010 at 4:04 PM, Michael Snoyman wrote:
>
>> Here's my response to the two points:
>>
>> * I haven't written a patch showing that Data.Text would be faster using
>> UTF-8 because that would require fulfilling
On Wed, Aug 18, 2010 at 4:12 AM, wren ng thornton wrote:
> There was a study recently on this. They found that there are four main
> parts of the Internet:
>
> * a densely connected core, where from any site you can get to any other
> * an "in cone", from which you can reach the core (but not oth
Hi Michael,
On Wed, Aug 18, 2010 at 4:04 PM, Michael Snoyman wrote:
> Here's my response to the two points:
>
> * I haven't written a patch showing that Data.Text would be faster using
> UTF-8 because that would require fulfilling the second point (I'll get to in
> a second). I *have* shown where
On 18 August 2010 15:04, Michael Snoyman wrote:
> For me, the whole point of this discussion was to
> determine whether we should attempt porting to UTF-8, which as I understand
> it would be a rather large undertaking.
And the answer to that is, yes but only if we have good reason to
believe it
On Wed, Aug 18, 2010 at 2:39 PM, Johan Tibell wrote:
> On Wed, Aug 18, 2010 at 2:12 AM, John Meacham wrote:
>
>>
>> That said, there is never a reason to use UTF-16, it is a vestigial
>> remanent from the brief period when it was thought 16 bits would be
>> enough for the unicode standard, any d
Johan Tibell writes:
> Text continues to be UTF-16 today because
>
> * no one has written a benchmark that shows that UTF-8 would be faster
> *for use in Data.Text*, and
> * no one has written a patch that converts Text to use UTF-8 internally.
>
> I'm quite frustrated by this whole discu
On Wed, Aug 18, 2010 at 2:12 AM, John Meacham wrote:
>
> That said, there is never a reason to use UTF-16, it is a vestigial
> remanent from the brief period when it was thought 16 bits would be
> enough for the unicode standard, any defense of it nowadays is after the
> fact justification for h
Alright, here's the results for the first three in the list (please forgive
me for being lazy- I am a Haskell programmer after all):
ifeng.com:
UTF8: 299949
UTF16: 566610
dzh.mop.com:
GBK: 1866
UTF8: 1891
UTF16: 3684
www.csdn.net:
UTF8: 122870
UTF16: 217420
Seems like UTF8 is a consistent winne
Jinjing Wang wrote:
John Millikin wrote:
The reason many Japanese and Chinese users reject UTF-8 isn't due to
space constraints (UTF-8 and UTF-16 are roughly equal), it's because
they reject Unicode itself.
+1.
This is the thing Unicode advocates don't want to admit. Until Unicode has
code poi
John Millikin writes:
> The reason many Japanese and Chinese users reject UTF-8 isn't due to
> space constraints (UTF-8 and UTF-16 are roughly equal), it's because
> they reject Unicode itself.
Probably because they don't think it's complicated enough¹?
> Shift-JIS and the various Chinese enc
More typical Chinese web sites:
www.ifeng.com (web site likes nytimes)
dzh.mop.com (community for fun)
www.csdn.net (web site for IT)
www.sohu.com(web site like yahoo)
www.sina.com (web site like yahoo)
-- Andrew
On Wed, Aug 18, 2010
> John Millikin wrote:
>>
>> The reason many Japanese and Chinese users reject UTF-8 isn't due to
>> space constraints (UTF-8 and UTF-16 are roughly equal), it's because
>> they reject Unicode itself.
>
> +1.
>
> This is the thing Unicode advocates don't want to admit. Until Unicode has
> code poin
Well, I'm not certain if it counts as a typical Chinese website, but here
are the stats;
UTF8: 64,198
UTF16: 113,160
And just for fun, after gziping:
UTF8: 17,708
UTF16: 19,367
On Wed, Aug 18, 2010 at 2:59 AM, anderson leo wrote:
> Hi michael, here is a web site http://zh.wikipedia.org/zh-cn/
Ivan Lazar Miljenovic wrote:
On 18 August 2010 12:12, wren ng thornton wrote:
Johan Tibell wrote:
To my knowledge the data we have about prevalence of encoding on the web
is
accurate. We crawl all pages we can get our hands on, by starting at some
set of seeds and then following all the links.
On Aug 17, 2010, at 11:51 PM, Ketil Malde wrote:
> Yitzchak Gale writes:
>
>> I don't think the genome is typical text.
>
> I think the typical *large* collection of text is text-encoded data, and
> not, for lack of a better word, literature. Genomics data is just an
> example.
I have a coll
On 18 August 2010 12:12, wren ng thornton wrote:
> Johan Tibell wrote:
>>
>> To my knowledge the data we have about prevalence of encoding on the web
>> is
>> accurate. We crawl all pages we can get our hands on, by starting at some
>> set of seeds and then following all the links. You cannot be s
Johan Tibell wrote:
To my knowledge the data we have about prevalence of encoding on the web is
accurate. We crawl all pages we can get our hands on, by starting at some
set of seeds and then following all the links. You cannot be sure that
you've reached all web sites as there might be cliques i
John Millikin wrote:
The reason many Japanese and Chinese users reject UTF-8 isn't due to
space constraints (UTF-8 and UTF-16 are roughly equal), it's because
they reject Unicode itself.
+1.
This is the thing Unicode advocates don't want to admit. Until Unicode
has code points for _all_ Chine
Michael Snoyman wrote:
On Tue, Aug 17, 2010 at 2:20 PM, Ivan Lazar Miljenovic <
ivan.miljeno...@gmail.com> wrote:
Michael Snoyman writes:
I don't think *anyone* is asserting that UTF-16 is a common encoding
for files anywhere,
*ahem*
http://en.wikipedia.org/wiki/UTF-16/UCS-2#Use_in_major_op
On Tue, Aug 17, 2010 at 12:30, Donn Cave wrote:
> If Haskell had the development resources to make something like this
> work, would it actually take the form of a Haskell-level type like
> that - data Text = (Encoding, ByteString)? I mean, I know that's
> just a very clear and convenient way to
Bulat Ziganshin wrote:
Johan wrote:
So it's not clear to me that using UTF-16 makes the program
noticeably slower or use more memory on a real program.
it's clear misunderstanding. of course, not every program holds much
text data in memory. but some does, and here you will double memory
usage
On Tue, Aug 17, 2010 at 03:21:32PM +0200, Daniel Peebles wrote:
> Sounds to me like we need a lazy Data.Text variation that allows UTF-8 and
> UTF-16 "segments" in it list of strict text elements :) Then big chunks of
> western text will be encoded efficiently, and same with CJK! Not sure what
> to
Hi michael, here is a web site http://zh.wikipedia.org/zh-cn/. It is the
wikipedia for Chinese.
-Andrew
On Tue, Aug 17, 2010 at 7:00 PM, Michael Snoyman wrote:
>
>
> On Tue, Aug 17, 2010 at 1:50 PM, Yitzchak Gale wrote:
>
>> Ketil Malde wrote:
>> > I haven't benchmarked it, but I'm fairly sure
On Tue, Aug 17, 2010 at 9:30 PM, Donn Cave wrote:
> Quoth John Millikin ,
>
> > Ruby, which has an enormous Japanese userbase, solved the problem by
> > essentially defining Text = (Encoding, ByteString), and then
> > re-implementing text logic for each encoding. This allows very
> > efficient op
Quoth John Millikin ,
> Ruby, which has an enormous Japanese userbase, solved the problem by
> essentially defining Text = (Encoding, ByteString), and then
> re-implementing text logic for each encoding. This allows very
> efficient operation with every possible encoding, at the cost of
> increase
On Tue, Aug 17, 2010 at 6:19 PM, John Millikin wrote:
> Ruby, which has an enormous Japanese userbase, solved the problem by
> essentially defining Text = (Encoding, ByteString), and then
> re-implementing text logic for each encoding. This allows very
> efficient operation with every possible en
On Tue, Aug 17, 2010 at 06:12, Michael Snoyman wrote:
> I'm not talking about API changes here; the topic at hand is the internal
> representation of the stream of characters used by the text package. That is
> currently UTF-16; I would argue switching to UTF8.
The Data.Text.Foreign module is par
(Actually, this seems more like a job for a type class.)
2010/8/17 Gábor Lehel :
> Someone mentioned earlier that IHHO all of this messing around with
> encodings and conversions should be handled transparently, and I guess
> you could do something like have the internal representation be along
>
Someone mentioned earlier that IHHO all of this messing around with
encodings and conversions should be handled transparently, and I guess
you could do something like have the internal representation be along
the lines of Either UTF8 UTF16 (or perhaps even more encodings), and
then implement every
Felipe Lessa writes:
[-snip- I've already spent too much time on the other stuff :-]
> And what do you think about creating a real SeqData data type
> with two bases per byte? In terms of processing speed I guess
> there will be a small penalty, but if you need to have large
> quantities of bas
Sounds to me like we need a lazy Data.Text variation that allows UTF-8 and
UTF-16 "segments" in it list of strict text elements :) Then big chunks of
western text will be encoded efficiently, and same with CJK! Not sure what
to do about strict Data.Text though :)
On Tue, Aug 17, 2010 at 1:40 PM, K
On Tue, Aug 17, 2010 at 3:23 PM, Yitzchak Gale wrote:
> Michael Snoyman wrote:
> > Regarding the data: you haven't actually quoted any
> > statistics about the prevalence of CJK data
>
> True, I haven't seen any - except for Google, which
> I don't believe is accurate. I would like to see some
>
On Tue, Aug 17, 2010 at 2:23 PM, Yitzchak Gale wrote:
> Michael Snoyman wrote:
> > Regarding the data: you haven't actually quoted any
> > statistics about the prevalence of CJK data
>
> True, I haven't seen any - except for Google, which
> I don't believe is accurate. I would like to see some
>
> "Johan" == Johan Tibell writes:
Johan> On Tue, Aug 17, 2010 at 1:36 PM, Tako Schotanus
wrote:
Johan> Yeah, I tried looking it up but I could find the
Johan> technical definition for Char, but in the end I found that
Johan> "maxBound" was "0x10" making it basically
Hello, Ketil Malde!
On Tue, Aug 17, 2010 at 8:02 AM, Ketil Malde wrote:
> Ivan Lazar Miljenovic writes:
>
>> Seeing as how the genome just uses 4 base "letters",
>
> Yes, the bulk of the data is not really "text" at all, but each sequence
> (it's fragmented due to the molecular division into chr
On Tue, Aug 17, 2010 at 1:36 PM, Tako Schotanus wrote:
> Yeah, I tried looking it up but I could find the technical definition for
> Char, but in the end I found that "maxBound" was "0x10" making it
> basically 24 bits :)
>
I think that's enough to represent all the assigned Unicode code poi
Ketil Malde wrote:
> I'd point out that it seems at least as unfair to optimize for CJK at
> the cost of Western languages.
Quite true.
> [...speculative calculation from which we conclude that]
> a given document translated
> between Chinese and English should occupy roughly the same space in
>
Colin Paul Adams writes:
>> Char is not an encoding, right?
>
> Ivan> No, but in GHC at least it corresponds to a Unicode codepoint.
>
> I don't think this is right, or shouldn't be right, anyway.. Surely it
> stands for a character. Unicode codepoints include non-characters such
> as the sur
Michael Snoyman wrote:
> Regarding the data: you haven't actually quoted any
> statistics about the prevalence of CJK data
True, I haven't seen any - except for Google, which
I don't believe is accurate. I would like to see some
good unbiased data.
Right now we just have our intuitions based on a
On Aug 17, 1:55 pm, Tako Schotanus wrote:
> I'll repeat here that in my opinion a Text package should be good at
> handling text, human text, from whatever country. If I need to handle large
> streams of ASCII I'll use something else.
I would mostly agree.
However, a key use case for Text in
On Tue, Aug 17, 2010 at 13:40, Ketil Malde wrote:
> Michael Snoyman writes:
>
> > As far as space usage, you are correct that CJK data will take up more
> > memory in UTF-8 than UTF-16.
>
> With the danger of sounding ... alphabetist? as well as belaboring a
> point I agree is irrelevant (the st
Yitzchak Gale writes:
> I don't think the genome is typical text.
I think the typical *large* collection of text is text-encoded data, and
not, for lack of a better word, literature. Genomics data is just an
example.
-k
--
If I haven't seen further, it is by standing in the footprints of gian
> "Ivan" == Ivan Lazar Miljenovic writes:
> Char is not an encoding, right?
Ivan> No, but in GHC at least it corresponds to a Unicode codepoint.
I don't think this is right, or shouldn't be right, anyway.. Surely it
stands for a character. Unicode codepoints include non-characters such
Michael Snoyman writes:
> As far as space usage, you are correct that CJK data will take up more
> memory in UTF-8 than UTF-16.
With the danger of sounding ... alphabetist? as well as belaboring a
point I agree is irrelevant (the storage format):
I'd point out that it seems at least as unfair
On Tue, Aug 17, 2010 at 13:29, Ketil Malde wrote:
> Tako Schotanus writes:
>
> >> Just like Char is capable of encoding any valid Unicode codepoint.
>
> > Unless a Char in Haskell is 32 bits (or at least more than 16 bits) it
> con
> > NOT encode all Unicode points.
>
> And since it can encode (
On Tue, Aug 17, 2010 at 2:20 PM, Ivan Lazar Miljenovic <
ivan.miljeno...@gmail.com> wrote:
> Michael Snoyman writes:
>
> > I don't think *anyone* is asserting that UTF-16 is a common encoding
> > for files anywhere,
>
> *ahem*
> http://en.wikipedia.org/wiki/UTF-16/UCS-2#Use_in_major_operating_sys
On Tue, Aug 17, 2010 at 13:00, Michael Snoyman wrote:
>
>
> On Tue, Aug 17, 2010 at 1:50 PM, Yitzchak Gale wrote:
>
>> Ketil Malde wrote:
>> > I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
>> > 3Gbyte file (the Human genome, say¹), into a computer with 4Gbytes of
>> > R
Michael Snoyman writes:
> I don't think *anyone* is asserting that UTF-16 is a common encoding
> for files anywhere,
*ahem*
http://en.wikipedia.org/wiki/UTF-16/UCS-2#Use_in_major_operating_systems_and_environments
--
Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com
IvanMiljenovic.wordpress.com
Tako Schotanus writes:
> On Tue, Aug 17, 2010 at 12:54, Ivan Lazar Miljenovic <
> ivan.miljeno...@gmail.com> wrote:
>
>> Tom Harper writes:
>>
>> > 2010/8/17 Bulat Ziganshin :
>> >> Hello Tom,
>> >
>> >
>> >
>> >> i don't understand what you mean. are you support all 2^20 codepoints
>> >> in Da
On Tue, Aug 17, 2010 at 1:05 PM, Bulat Ziganshin
wrote:
> Hello Tako,
>
> Tuesday, August 17, 2010, 3:03:20 PM, you wrote:
>
> > Unless a Char in Haskell is 32 bits (or at least more than 16 bits)
> > it con NOT encode all Unicode points.
>
> it's 32 bit
>
Like Bulat said it's 32 bit. It's *defin
On Tue, Aug 17, 2010 at 12:39 PM, Bulat Ziganshin wrote:
> Hello Tom,
>
> Tuesday, August 17, 2010, 2:09:09 PM, you wrote:
>
> > In the first iteration of the Text package, UTF-16 was chosen because
> > it had a nice balance of arithmetic overhead and space. The
> > arithmetic for UTF-8 started
Hi Ketil,
On Tue, Aug 17, 2010 at 12:09 PM, Ketil Malde wrote:
> Johan Tibell writes:
>
> > It's not clear to me that using UTF-16 internally does make Data.Text
> > noticeably slower.
>
> I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
> 3Gbyte file (the Human genome, s
Hello Tako,
Tuesday, August 17, 2010, 3:03:20 PM, you wrote:
> Unless a Char in Haskell is 32 bits (or at least more than 16 bits)
> it con NOT encode all Unicode points.
it's 32 bit
--
Best regards,
Bulatmailto:bulat.zigans...@gmail.com
_
Ivan Lazar Miljenovic writes:
> Seeing as how the genome just uses 4 base "letters",
Yes, the bulk of the data is not really "text" at all, but each sequence
(it's fragmented due to the molecular division into chromosomes, and
due to incompleteness) also has a textual header. Generally, the
On Tue, Aug 17, 2010 at 12:54, Ivan Lazar Miljenovic <
ivan.miljeno...@gmail.com> wrote:
> Tom Harper writes:
>
> > 2010/8/17 Bulat Ziganshin :
> >> Hello Tom,
> >
> >
> >
> >> i don't understand what you mean. are you support all 2^20 codepoints
> >> in Data.Text package?
> >
> > Bulat,
> >
> >
On Tue, Aug 17, 2010 at 1:50 PM, Yitzchak Gale wrote:
> Ketil Malde wrote:
> > I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
> > 3Gbyte file (the Human genome, say¹), into a computer with 4Gbytes of
> > RAM, UTF-16 will be slower than UTF-8...
>
> I don't think the genom
Miguel Mitrofanov writes:
> Ivan Lazar Miljenovic wrote:
>> Tom Harper writes:
>>
>>> 2010/8/17 Bulat Ziganshin :
Hello Tom,
>>>
>>>
i don't understand what you mean. are you support all 2^20 codepoints
in Data.Text package?
>>> Bulat,
>>>
>>> Yes, its internal representation is
Ivan Lazar Miljenovic wrote:
Tom Harper writes:
2010/8/17 Bulat Ziganshin :
Hello Tom,
i don't understand what you mean. are you support all 2^20 codepoints
in Data.Text package?
Bulat,
Yes, its internal representation is UTF-16, which is capable of
encoding *any* valid Unicode codepo
Tom Harper writes:
> 2010/8/17 Bulat Ziganshin :
>> Hello Tom,
>
>
>
>> i don't understand what you mean. are you support all 2^20 codepoints
>> in Data.Text package?
>
> Bulat,
>
> Yes, its internal representation is UTF-16, which is capable of
> encoding *any* valid Unicode codepoint.
Just li
Ketil Malde wrote:
> I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
> 3Gbyte file (the Human genome, say¹), into a computer with 4Gbytes of
> RAM, UTF-16 will be slower than UTF-8...
I don't think the genome is typical text. And
I doubt that is true if that text is in a CJ
2010/8/17 Bulat Ziganshin :
> Hello Tom,
> i don't understand what you mean. are you support all 2^20 codepoints
> in Data.Text package?
Bulat,
Yes, its internal representation is UTF-16, which is capable of
encoding *any* valid Unicode codepoint.
-- Tom
__
Hello Tom,
Tuesday, August 17, 2010, 2:09:09 PM, you wrote:
> In the first iteration of the Text package, UTF-16 was chosen because
> it had a nice balance of arithmetic overhead and space. The
> arithmetic for UTF-8 started to have serious performance impacts in
> situations where the entire do
> "Ketil" == Ketil Malde writes:
Ketil> Johan Tibell writes:
>> It's not clear to me that using UTF-16 internally does make
>> Data.Text noticeably slower.
Ketil> I think that *IF* we are aiming for a single, grand, unified
Ketil> text library to Rule Them All, it needs
Ketil Malde writes:
> Johan Tibell writes:
>
>> It's not clear to me that using UTF-16 internally does make Data.Text
>> noticeably slower.
>
> I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
> 3Gbyte file (the Human genome, say¹), into a computer with 4Gbytes of
> RAM,
Johan Tibell writes:
> It's not clear to me that using UTF-16 internally does make Data.Text
> noticeably slower.
I haven't benchmarked it, but I'm fairly sure that, if you try to fit a
3Gbyte file (the Human genome, say¹), into a computer with 4Gbytes of
RAM, UTF-16 will be slower than UTF-8.
> I agree, Data.Text is great. Unfortunately, its internal use of UTF-16
> makes it inefficient for many purposes.
In the first iteration of the Text package, UTF-16 was chosen because
it had a nice balance of arithmetic overhead and space. The
arithmetic for UTF-8 started to have serious perfor
Hello Tako,
Tuesday, August 17, 2010, 12:46:35 PM, you wrote:
> not slower but require 2x more memory. speed is the same since
> Unicode contains 2^20 codepoints
> This is not entirely correct because it all depends on your data.
of course i mean ascii chars
--
Best regards,
Bulat
Hello Johan,
Tuesday, August 17, 2010, 1:06:30 PM, you wrote:
> So it's not clear to me that using UTF-16 makes the program
> noticeably slower or use more memory on a real program.
it's clear misunderstanding. of course, not every program holds much
text data in memory. but some does, and here
Hi Bulat,
On Tue, Aug 17, 2010 at 10:34 AM, Bulat Ziganshin wrote:
> > It's not clear to me that using UTF-16 internally does make
> > Data.Text noticeably slower.
>
> not slower but require 2x more memory. speed is the same since
> Unicode contains 2^20 codepoints
>
Yes, in theory a program c
On Tue, Aug 17, 2010 at 10:34, Bulat Ziganshin wrote:
> Hello Johan,
>
> Tuesday, August 17, 2010, 12:20:37 PM, you wrote:
>
> > I agree, Data.Text is great. Unfortunately, its internal use of UTF-16
> > makes it inefficient for many purposes.
>
> > It's not clear to me that using UTF-16 intern
Hello Johan,
Tuesday, August 17, 2010, 12:20:37 PM, you wrote:
> I agree, Data.Text is great. Unfortunately, its internal use of UTF-16
> makes it inefficient for many purposes.
> It's not clear to me that using UTF-16 internally does make
> Data.Text noticeably slower.
not slower but requir
On Tue, Aug 17, 2010 at 9:08 AM, Ketil Malde wrote:
> Benedikt Huber writes:
>
> > Despite of all this, I think the performance of the text
> > package is very promising, and hope it will improve further!
>
> I agree, Data.Text is great. Unfortunately, its internal use of UTF-16
> makes it inef
On Tue, Aug 17, 2010 at 10:08 AM, Ketil Malde wrote:
> Benedikt Huber writes:
>
> > Despite of all this, I think the performance of the text
> > package is very promising, and hope it will improve further!
>
> I agree, Data.Text is great. Unfortunately, its internal use of UTF-16
> makes it ine
Benedikt Huber writes:
> Despite of all this, I think the performance of the text
> package is very promising, and hope it will improve further!
I agree, Data.Text is great. Unfortunately, its internal use of UTF-16
makes it inefficient for many purposes.
A large fraction - probably most - tex
On 16.08.10 14:44, Daniel Fischer wrote:
Hi Bulat,
On Monday 16 August 2010 07:35:44, Bulat Ziganshin wrote:
Hello Daniel,
Sunday, August 15, 2010, 10:39:24 PM, you wrote:
That's great. If that performance difference is a show stopper, one
shouldn't go higher-level than C anyway :)
*all* spe
Hi Bulat,
On Monday 16 August 2010 07:35:44, Bulat Ziganshin wrote:
> Hello Daniel,
>
> Sunday, August 15, 2010, 10:39:24 PM, you wrote:
> > That's great. If that performance difference is a show stopper, one
> > shouldn't go higher-level than C anyway :)
>
> *all* speed measurements that find Has
Hello Daniel,
Sunday, August 15, 2010, 10:39:24 PM, you wrote:
> That's great. If that performance difference is a show stopper, one
> shouldn't go higher-level than C anyway :)
*all* speed measurements that find Haskell is as fast as C, was
broken. Let's see:
D:\testing>read MsOffice.arc
MsOff
Hello Bryan,
Sunday, August 15, 2010, 10:04:01 PM, you wrote:
> shared on Friday, and boiled it down to a simple test case: how long does it
> take to read a 31MB file?
> GNU wc -m:
there are even slower ways to do it if you need :)
if your data aren't cached, then speed is limited by HDD. if
Gregory Collins writes:
> Ivan Lazar Miljenovic writes:
>
>> Don Stewart writes:
>>
>>> * Pay attention to Haskell Cafe announcements
>>> * Follow the Reddit Haskell news.
>>> * Read the quarterly reports on Hackage
>>> * Follow Planet Haskell
>>
>> And yet there are still many
wren:
> Bryan O'Sullivan wrote:
>> As a case in point, I took the string search benchmark that Daniel shared
>> on Friday, and boiled it down to a simple test case: how long does it take
>> to read a 31MB file?
>>
>> GNU wc -m:
>>
>>- en_US.UTF-8: 0.701s
>>
>> text 0.7.1.0:
>>
>>- lazy text
Bryan O'Sullivan wrote:
As a case in point, I took the string search benchmark that Daniel shared
on Friday, and boiled it down to a simple test case: how long does it take
to read a 31MB file?
GNU wc -m:
- en_US.UTF-8: 0.701s
text 0.7.1.0:
- lazy text: 1.959s
- strict text: 3.527s
Ivan Lazar Miljenovic writes:
> Don Stewart writes:
>
>> * Pay attention to Haskell Cafe announcements
>> * Follow the Reddit Haskell news.
>> * Read the quarterly reports on Hackage
>> * Follow Planet Haskell
>
> And yet there are still many packages that fall under the radar wi
On Sunday 15 August 2010 20:53:32, Bryan O'Sullivan wrote:
> On Sun, Aug 15, 2010 at 11:39 AM, Daniel Fischer
>
> wrote:
> > Out of curiosity, what kind of speed-up did your Friday fix bring to
> > the searching/replacing functions?
>
> Quite a bit!
>
> text 0.7.1.0 and 0.7.2.1:
>
>- 1.056s
>
>
Quoth Andrew Coppin ,
...
> And if you fail to recognise what a grave mistake placing performance
> before correctness is, you end up with things like buffer overflow
> exploits, SQL injection attacks, the Y2K bug, programs that can't handle
> files larger than 2GB or that don't understand Unico
On Sun, Aug 15, 2010 at 11:39 AM, Daniel Fischer
wrote:
> Out of curiosity, what kind of speed-up did your Friday fix bring to the
> searching/replacing functions?
>
Quite a bit!
text 0.7.1.0 and 0.7.2.1:
- 1.056s
darcs HEAD:
- 0.158s
___
Hask
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 8/15/10 14:34 , Andrew Coppin wrote:
> Brandon S Allbery KF8NH wrote:
>> (Remember that Unix is itself a practical example of a research platform
>> "avoiding success at any cost" gone horribly wrong.)
>
> I haven't used Erlang myself, but I've hea
On Sunday 15 August 2010 20:04:01, Bryan O'Sullivan wrote:
> On Sat, Aug 14, 2010 at 6:05 PM, Bryan O'Sullivan
wrote:
> >- If it's not good enough, and the fault lies in a library you
> > chose, report a bug and provide a test case.
> >
> As a case in point, I took the string search benchmark
Brandon S Allbery KF8NH wrote:
(Remember that Unix is itself a practical example of a research platform
"avoiding success at any cost" gone horribly wrong.)
I haven't used Erlang myself, but I've heard it described in a similar
way. (I don't know how true that actually is...)
On Sat, Aug 14, 2010 at 6:05 PM, Bryan O'Sullivan wrote:
>
>- If it's not good enough, and the fault lies in a library you chose,
>report a bug and provide a test case.
>
> As a case in point, I took the string search benchmark that Daniel shared
on Friday, and boiled it down to a simple t
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 8/15/10 13:53 , Andrew Coppin wrote:
> injection attacks, the Y2K bug, programs that can't handle files larger than
> 2GB or that don't understand Unicode, and so forth. All things that could
> have been almost trivially avoided if everybody wasn't
Donn Cave wrote:
Quoth Bill Atkins ,
No, not really. Linked lists are very easy to deal with recursively and
Strings automatically work with any already-defined list functions.
Yes, they're great - a terrible mistake, for a practical programming
language, but if you fail to recognize
Donn Cave wrote:
I wonder how many ByteString users are `working with bytes', in the
sense you apparently mean where the bytes are not text characters.
My impression is that in practice, there is a sizeable contingent
out here using ByteString.Char8 and relatively few applications for
the Word8 t
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 8/15/10 11:25 , Bill Atkins wrote:
> No, not really. Linked lists are very easy to deal with recursively and
> Strings automatically work with any already-defined list functions.
>
> On Sun, Aug 15, 2010 at 11:17 AM, Brandon S Allbery KF8NH
> mail
On Sun, Aug 15, 2010 at 12:50 PM, Donn Cave wrote:
> I wonder how many ByteString users are `working with bytes', in the
> sense you apparently mean where the bytes are not text characters.
> My impression is that in practice, there is a sizeable contingent
> out here using ByteString.Char8 and re
Quoth Bill Atkins ,
> No, not really. Linked lists are very easy to deal with recursively and
> Strings automatically work with any already-defined list functions.
Yes, they're great - a terrible mistake, for a practical programming
language, but if you fail to recognize the attraction, you miss
Quoth "Bryan O'Sullivan" ,
> On Sat, Aug 14, 2010 at 10:07 PM, Donn Cave wrote:
...
>> ByteString will continue to be the obvious choice
>> for big data loads.
>
> Don't confuse "I have big data" with "I need bytes". If you are working with
> bytes, use bytestring. If you are working with text, ou
No, not really. Linked lists are very easy to deal with recursively and
Strings automatically work with any already-defined list functions.
On Sun, Aug 15, 2010 at 11:17 AM, Brandon S Allbery KF8NH <
allb...@ece.cmu.edu> wrote:
> More to the point, there's nothing elegant about [Char] --- its so
1 - 100 of 169 matches
Mail list logo