Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Matt Joiner
On Fri, Nov 25, 2011 at 5:41 PM, Eli Bendersky eli...@gmail.com wrote: Eli, the use pattern I was referring to is when you read in chunks, and and append to a running buffer. Presumably if you know in advance the size of the data, you can readinto directly to a region of a bytearray. There by

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Antoine Pitrou
On Fri, 25 Nov 2011 08:38:48 +0200 Eli Bendersky eli...@gmail.com wrote: Just to be clear, there were two separate issues raised here. One is the speed regression of readinto() from 2.7 to 3.2, and the other is the relative slowness of justread() in 3.3 Regarding the second, I'm not sure

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Antoine Pitrou
On Fri, 25 Nov 2011 20:34:21 +1100 Matt Joiner anacro...@gmail.com wrote: It's Python 3.2. I tried it for larger files and got some interesting results. readinto() for 10MB files, reading 10MB all at once: readinto/2.7 100 loops, best of 3: 8.6 msec per loop readinto/3.2 10 loops, best

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Matt Joiner
You can see in the tests on the largest buffer size tested, 8192, that the naive read actually outperforms readinto(). It's possibly by extrapolating into significantly larger buffer sizes that readinto() gets left behind. It's also reasonable to assume that this wasn't tested thoroughly. On Fri,

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Matt Joiner
On Fri, Nov 25, 2011 at 10:04 PM, Antoine Pitrou solip...@pitrou.net wrote: On Fri, 25 Nov 2011 20:34:21 +1100 Matt Joiner anacro...@gmail.com wrote: It's Python 3.2. I tried it for larger files and got some interesting results. readinto() for 10MB files, reading 10MB all at once:

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Eli Bendersky
However, the original question remains - on the 100MB file also, although in 2.7 readinto is 35% faster than readandcopy(), on 3.2 it's about the same speed (even a few % slower). That said, I now observe with Python 3.3 the same speed as with 2.7, including the readinto() speedup - so it

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Matt Joiner
I was under the impression this is already in 3.3? On Nov 25, 2011 10:58 PM, Eli Bendersky eli...@gmail.com wrote: However, the original question remains - on the 100MB file also, although in 2.7 readinto is 35% faster than readandcopy(), on 3.2 it's about the same speed (even a few %

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Eli Bendersky
On Fri, Nov 25, 2011 at 14:02, Matt Joiner anacro...@gmail.com wrote: I was under the impression this is already in 3.3? Sure, but 3.3 wasn't released yet. Eli P.S. Top-posting again ;-) ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Antoine Pitrou
On Fri, 25 Nov 2011 22:37:49 +1100 Matt Joiner anacro...@gmail.com wrote: On Fri, Nov 25, 2011 at 10:04 PM, Antoine Pitrou solip...@pitrou.net wrote: On Fri, 25 Nov 2011 20:34:21 +1100 Matt Joiner anacro...@gmail.com wrote: It's Python 3.2. I tried it for larger files and got some

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Paul Moore
On 25 November 2011 11:37, Matt Joiner anacro...@gmail.com wrote: No wtf here, the read() loop is quadratic since you're building a new, larger, bytes object every iteration.  Python 2 has a fragile optimization for concatenation of strings, which can avoid the quadratic behaviour on some

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Amaury Forgeot d'Arc
2011/11/25 Paul Moore p.f.mo...@gmail.com The optimisation mentioned was an attempt (by mutating an existing string when the runtime determined that it was safe to do so) to hide the consequences of this fact from end-users who didn't fully understand the issues. It was relatively effective,

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Paul Moore
On 25 November 2011 15:07, Amaury Forgeot d'Arc amaur...@gmail.com wrote: 2011/11/25 Paul Moore p.f.mo...@gmail.com It would be nice to have the optimisation back if it's easy enough to do so, for quick-and-dirty code, but it is not a good idea to rely on it (and it's especially unwise to base

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-25 Thread Michael Foord
On 25/11/2011 15:48, Paul Moore wrote: On 25 November 2011 15:07, Amaury Forgeot d'Arcamaur...@gmail.com wrote: 2011/11/25 Paul Moorep.f.mo...@gmail.com It would be nice to have the optimisation back if it's easy enough to do so, for quick-and-dirty code, but it is not a good idea to rely on

[Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Eli Bendersky
Hi there, I was doing some experiments with the buffer interface of bytearray today, for the purpose of quickly reading a file's contents into a bytearray which I can then modify. I decided to do some benchmarking and ran into surprising results. Here are the functions I was timing: def

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Antoine Pitrou
On Thu, 24 Nov 2011 20:15:25 +0200 Eli Bendersky eli...@gmail.com wrote: Oops, readinto takes the same time as copying. This is a real shame, because readinto in conjunction with the buffer interface was supposed to avoid the redundant copy. Is there a real performance regression here, is

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Eli Bendersky
On Thu, Nov 24, 2011 at 20:29, Antoine Pitrou solip...@pitrou.net wrote: On Thu, 24 Nov 2011 20:15:25 +0200 Eli Bendersky eli...@gmail.com wrote: Oops, readinto takes the same time as copying. This is a real shame, because readinto in conjunction with the buffer interface was supposed to

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Matt Joiner
What if you broke up the read and built the final string object up. I always assumed this is where the real gain was with read_into. On Nov 25, 2011 5:55 AM, Eli Bendersky eli...@gmail.com wrote: On Thu, Nov 24, 2011 at 20:29, Antoine Pitrou solip...@pitrou.net wrote: On Thu, 24 Nov 2011

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Eli Bendersky
On Fri, Nov 25, 2011 at 00:02, Matt Joiner anacro...@gmail.com wrote: What if you broke up the read and built the final string object up. I always assumed this is where the real gain was with read_into. Matt, I'm not sure what you mean by this - can you suggest the code? Also, I'd be happy to

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Antoine Pitrou
On Thu, 24 Nov 2011 20:53:30 +0200 Eli Bendersky eli...@gmail.com wrote: Sure. Updated the default branch just now and built: $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.justread()' 1000 loops, best of 3: 1.14 msec per loop $1 -m timeit -s'import fileread_bytearray'

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Terry Reedy
On 11/24/2011 5:02 PM, Matt Joiner wrote: What if you broke up the read and built the final string object up. I always assumed this is where the real gain was with read_into. If a pure read takes twice as long in 3.3 as in 3.2, that is a concern regardless of whether there is a better way.

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Matt Joiner
Eli, Example coming shortly, the differences are quite significant. On Fri, Nov 25, 2011 at 9:41 AM, Eli Bendersky eli...@gmail.com wrote: On Fri, Nov 25, 2011 at 00:02, Matt Joiner anacro...@gmail.com wrote: What if you broke up the read and built the final string object up. I always

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Matt Joiner
It's my impression that the readinto method does not fully support the buffer interface I was expecting. I've never had cause to use it until now. I've created a question on SO that describes my confusion: http://stackoverflow.com/q/8263899/149482 Also I saw some comments on top-posting am I

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Antoine Pitrou
On Fri, 25 Nov 2011 12:02:17 +1100 Matt Joiner anacro...@gmail.com wrote: It's my impression that the readinto method does not fully support the buffer interface I was expecting. I've never had cause to use it until now. I've created a question on SO that describes my confusion:

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Matt Joiner
On Fri, Nov 25, 2011 at 12:07 PM, Antoine Pitrou solip...@pitrou.net wrote: On Fri, 25 Nov 2011 12:02:17 +1100 Matt Joiner anacro...@gmail.com wrote: It's my impression that the readinto method does not fully support the buffer interface I was expecting. I've never had cause to use it until

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Eli Bendersky
On Thu, 24 Nov 2011 20:53:30 +0200 Eli Bendersky eli...@gmail.com wrote: Sure. Updated the default branch just now and built: $1 -m timeit -s'import fileread_bytearray' 'fileread_bytearray.justread()' 1000 loops, best of 3: 1.14 msec per loop $1 -m timeit -s'import

Re: [Python-Dev] file.readinto performance regression in Python 3.2 vs. 2.7?

2011-11-24 Thread Eli Bendersky
Eli, the use pattern I was referring to is when you read in chunks, and and append to a running buffer. Presumably if you know in advance the size of the data, you can readinto directly to a region of a bytearray. There by avoiding having to allocate a temporary buffer for the read, and