Re: unicode() vs. s.decode()

2009-08-09 Thread Steven D'Aprano
On Sat, 08 Aug 2009 19:00:11 +0200, Thorsten Kampe wrote: >> I was running it one million times to mitigate influences on the timing >> by other background processes which is a common technique when >> benchmarking. > > Err, no. That is what "repeat" is for and it defaults to 3 ("This means > tha

Re: unicode() vs. s.decode()

2009-08-09 Thread Jeroen Ruigrok van der Werven
-On [20090808 20:07], Thorsten Kampe (thors...@thorstenkampe.de) wrote: >In real life people won't even notice whether an application takes one or >two minutes to complete. I think you are quite wrong here. I have worked with optical engineers who needed to calculate grating numbers for their len

Re: unicode() vs. s.decode()

2009-08-08 Thread Michael Ströder
Michael Fötsch wrote: > If speed is your primary concern, this will give you even better > performance than unicode(): > > decoder = codecs.lookup("utf-8").decode > for i in xrange(100): > decoder("äöüÄÖÜß")[0] Hmm, that could be interesting. I will give it a try. > However, there'

Re: unicode() vs. s.decode()

2009-08-08 Thread Thorsten Kampe
* Michael Ströder (Fri, 07 Aug 2009 03:25:03 +0200) > Thorsten Kampe wrote: > > * Michael Ströder (Thu, 06 Aug 2009 18:26:09 +0200) > > timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(1000) > >> 17.23644495010376 > > timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(1000) > >> 72.08

Re: unicode() vs. s.decode()

2009-08-08 Thread garabik-news-2005-05
Thorsten Kampe wrote: > lines". That *is* *exactly* nothing. > > Another guy claims he gets times between 2.9 and 6.2 seconds when > running decode/unicode in various manifestations over "18 million over a sample of 60 words (sorry for not being able to explain myself clear enough so th

Re: unicode() vs. s.decode()

2009-08-08 Thread garabik-news-2005-05
Thorsten Kampe wrote: > * garabik-news-2005...@kassiopeia.juls.savba.sk (Fri, 7 Aug 2009 > 17:41:38 + (UTC)) >> Thorsten Kampe wrote: >> > If you increase the number of loops to one million or one billion or >> > whatever even the slightest completely negligible difference will >> > occur. T

Re: unicode() vs. s.decode()

2009-08-08 Thread Thorsten Kampe
* alex23 (Fri, 7 Aug 2009 06:53:22 -0700 (PDT)) > Thorsten Kampe wrote: > > Bollocks. No one will even notice whether a code sequence runs 2.7 or > > 5.7 seconds. That's completely artificial benchmarking. > > But that's not what you first claimed: > > > I don't think any measurable speed increa

Re: unicode() vs. s.decode()

2009-08-08 Thread Thorsten Kampe
* Michael Ströder (Sat, 08 Aug 2009 15:09:23 +0200) > Thorsten Kampe wrote: > > * Steven D'Aprano (08 Aug 2009 03:29:43 GMT) > >> But why assume that the program takes 8 minutes to run? Perhaps it takes > >> 8 seconds to run, and 6 seconds of that is the decoding. Then halving > >> that reduces t

Re: unicode() vs. s.decode()

2009-08-08 Thread Michael Fötsch
Michael Ströder wrote: > >>> timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(1000) > 17.23644495010376 > >>> timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(1000) > 72.087096929550171 > > That is significant! So the winner is: > > unicode('äöüÄÖÜß','utf-8') Which proves that benchmark r

Re: unicode() vs. s.decode()

2009-08-08 Thread Michael Ströder
Thorsten Kampe wrote: > * Steven D'Aprano (08 Aug 2009 03:29:43 GMT) >> But why assume that the program takes 8 minutes to run? Perhaps it takes >> 8 seconds to run, and 6 seconds of that is the decoding. Then halving >> that reduces the total runtime from 8 seconds to 5, which is a noticeable >

Re: unicode() vs. s.decode()

2009-08-08 Thread Thorsten Kampe
* garabik-news-2005...@kassiopeia.juls.savba.sk (Fri, 7 Aug 2009 17:41:38 + (UTC)) > Thorsten Kampe wrote: > > If you increase the number of loops to one million or one billion or > > whatever even the slightest completely negligible difference will > > occur. The same thing will happen if yo

Re: unicode() vs. s.decode()

2009-08-08 Thread Thorsten Kampe
* alex23 (Fri, 7 Aug 2009 10:45:29 -0700 (PDT)) > garabik-news-2005...@kassiopeia.juls.savba.sk wrote: > > I am not sure I understood that. Must be my English :-) > > I just parsed it as "blah blah blah I won't admit I'm wrong" and > didn't miss anything substantive. Alex, there are still a numbe

Re: unicode() vs. s.decode()

2009-08-08 Thread Thorsten Kampe
* Steven D'Aprano (08 Aug 2009 03:29:43 GMT) > On Fri, 07 Aug 2009 17:13:07 +0200, Thorsten Kampe wrote: > > One guy claims he has times between 2.7 and 5.7 seconds when > > benchmarking more or less randomly generated "one million different > > lines". That *is* *exactly* nothing. > > We agree th

Re: unicode() vs. s.decode()

2009-08-07 Thread Steven D'Aprano
On Fri, 07 Aug 2009 17:13:07 +0200, Thorsten Kampe wrote: > One guy claims he has times between 2.7 and 5.7 seconds when > benchmarking more or less randomly generated "one million different > lines". That *is* *exactly* nothing. We agree that in the grand scheme of things, a difference of 2.7 s

Re: unicode() vs. s.decode()

2009-08-07 Thread Steven D'Aprano
On Fri, 07 Aug 2009 12:00:42 +0200, Thorsten Kampe wrote: > Bollocks. No one will even notice whether a code sequence runs 2.7 or > 5.7 seconds. That's completely artificial benchmarking. You think users won't notice a doubling of execution time? Well, that explains some of the apps I'm forced t

Re: unicode() vs. s.decode()

2009-08-07 Thread alex23
garabik-news-2005...@kassiopeia.juls.savba.sk wrote: > I am not sure I understood that. Must be my English :-) I just parsed it as "blah blah blah I won't admit I'm wrong" and didn't miss anything substantive. -- http://mail.python.org/mailman/listinfo/python-list

Re: unicode() vs. s.decode()

2009-08-07 Thread alex23
Thorsten Kampe wrote: > Bollocks. No one will even notice whether a code sequence runs 2.7 or > 5.7 seconds. That's completely artificial benchmarking. But that's not what you first claimed: > I don't think any measurable speed increase will be > noticeable between those two. But please, keep c

Re: unicode() vs. s.decode()

2009-08-07 Thread garabik-news-2005-05
Thorsten Kampe wrote: > * Steven D'Aprano (06 Aug 2009 19:17:30 GMT) >> What if you're writing a loop which takes one million different lines of >> text and decodes them once each? >> >> >>> setup = 'L = ["abc"*(n%100) for n in xrange(100)]' >> >>> t1 = timeit.Timer('for line in L: line.deco

Re: unicode() vs. s.decode()

2009-08-07 Thread Thorsten Kampe
* Steven D'Aprano (06 Aug 2009 19:17:30 GMT) > On Thu, 06 Aug 2009 20:05:52 +0200, Thorsten Kampe wrote: > > > That is significant! So the winner is: > > > > > > unicode('äöüÄÖÜß','utf-8') > > > > Unless you are planning to write a loop that decodes "äöüÄÖÜß" one > > million times, these benchmar

Re: unicode() vs. s.decode()

2009-08-07 Thread Steven D'Aprano
On Fri, 07 Aug 2009 08:04:51 +0100, Mark Lawrence wrote: > I believe that the comment "these benchmarks are meaningless" refers to > the length of the strings being used in the tests. Surely something > involving thousands or millions of characters is more meaningful? Or to > go the other way, yo

Re: unicode() vs. s.decode()

2009-08-07 Thread Mark Lawrence
Michael Ströder wrote: Thorsten Kampe wrote: * Michael Ströder (Thu, 06 Aug 2009 18:26:09 +0200) timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(1000) 17.23644495010376 timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(1000) 72.087096929550171 That is significant! So the winner is:

Re: unicode() vs. s.decode()

2009-08-06 Thread John Machin
Jason Tackaberry urandom.ca> writes: > On Thu, 2009-08-06 at 01:31 +, John Machin wrote: > > Suggested further avenues of investigation: > > > > (1) Try the timing again with "cp1252" and "utf8" and "utf_8" > > > > (2) grep "utf-8" /Objects/unicodeobject.c > > Very pedagogical of you. :)

Re: unicode() vs. s.decode()

2009-08-06 Thread Michael Ströder
Thorsten Kampe wrote: > * Michael Ströder (Thu, 06 Aug 2009 18:26:09 +0200) > timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(1000) >> 17.23644495010376 > timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(1000) >> 72.087096929550171 >> >> That is significant! So the winner is: >> >>

Re: unicode() vs. s.decode()

2009-08-06 Thread Steven D'Aprano
On Thu, 06 Aug 2009 20:05:52 +0200, Thorsten Kampe wrote: > > That is significant! So the winner is: > > > > unicode('äöüÄÖÜß','utf-8') > > Unless you are planning to write a loop that decodes "äöüÄÖÜß" one > million times, these benchmarks are meaningless. What if you're writing a loop which t

Re: unicode() vs. s.decode()

2009-08-06 Thread Thorsten Kampe
* Michael Ströder (Thu, 06 Aug 2009 18:26:09 +0200) > Thorsten Kampe wrote: > > * Michael Ströder (Wed, 05 Aug 2009 16:43:09 +0200) > > I don't think any measurable speed increase will be noticeable > > between those two. > > Well, seems not to be true. Try yourself. I did (my console has UTF-8 as

Re: unicode() vs. s.decode()

2009-08-06 Thread Michael Ströder
Thorsten Kampe wrote: > * Michael Ströder (Wed, 05 Aug 2009 16:43:09 +0200) >> These both expressions are equivalent but which is faster or should be >> used for any reason? >> >> u = unicode(s,'utf-8') >> >> u = s.decode('utf-8') # looks nicer > > "decode" was added in Python 2.2 for the sake of

Re: unicode() vs. s.decode()

2009-08-06 Thread Thorsten Kampe
* Michael Ströder (Wed, 05 Aug 2009 16:43:09 +0200) > These both expressions are equivalent but which is faster or should be > used for any reason? > > u = unicode(s,'utf-8') > > u = s.decode('utf-8') # looks nicer "decode" was added in Python 2.2 for the sake of symmetry to encode(). It's esse

Re: unicode() vs. s.decode()

2009-08-06 Thread Jason Tackaberry
On Thu, 2009-08-06 at 01:31 +, John Machin wrote: > Faster by an enormous margin; attributing this to the cost of attribute lookup > seems implausible. Ok, fair point. I don't think the time difference fully registered when I composed that message. Testing a global access (LOAD_GLOBAL) versu

Re: unicode() vs. s.decode()

2009-08-05 Thread John Machin
Jason Tackaberry urandom.ca> writes: > On Wed, 2009-08-05 at 16:43 +0200, Michael Ströder wrote: > > These both expressions are equivalent but which is faster or should be used > > for any reason? > > u = unicode(s,'utf-8') > > u = s.decode('utf-8') # looks nicer > > It is sometimes non-obvious w

Re: unicode() vs. s.decode()

2009-08-05 Thread 1x7y2z9
unicode() has LOAD_GLOBAL which s.decode() does not. Is it generally the case that LOAD_ATTR is slower than LOAD_GLOBAL that lead to your intuition that the former would probably be slower? Or some other intuition? Of course, the results from timeit are a different thing - I ask about the intuiti

Re: unicode() vs. s.decode()

2009-08-05 Thread Jason Tackaberry
On Wed, 2009-08-05 at 16:43 +0200, Michael Ströder wrote: > These both expressions are equivalent but which is faster or should be used > for any reason? > > u = unicode(s,'utf-8') > > u = s.decode('utf-8') # looks nicer It is sometimes non-obvious which constructs are faster than others in Pytho

unicode() vs. s.decode()

2009-08-05 Thread Michael Ströder
HI! These both expressions are equivalent but which is faster or should be used for any reason? u = unicode(s,'utf-8') u = s.decode('utf-8') # looks nicer Ciao, Michael. -- http://mail.python.org/mailman/listinfo/python-list