On Fri, Dec 16, 2011 at 3:30 AM, Gaëtan de Menten <gdemen...@gmail.com> wrote:
> On Thu, Dec 15, 2011 at 19:52, Jon Nelson <jnel...@jamponi.net> wrote:
>> On Thu, Dec 15, 2011 at 12:01 PM, Michael Bayer
>> <mike...@zzzcomputing.com> wrote:
>>> On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote:
>>>> Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4
>>>> with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64.
>>>> I did some performance testing. Selecting 75 million rows (a straight
>>>> up SELECT colA from tableA) from a 5GB table yielded some interesting
>>>> results.
>>>> psycopg2 averaged between 485,000 and 585,000 rows per second.
>>>> Using COPY (via psycopg2) the average was right around 585,000.
>>>> sqlalchemy averaged between 160,000 and 190,000 rows per second.
>>>> That's a pretty big difference.
> Weird, IIRC, SA was much closer than raw psycopg2 (without using
> COPY), in the range of SA adding a 50% overhead, not a 200% overhead.
>>>> I briefly looked into what the cause could be, but I didn't see
>>>> anything jump out at me (except RowProxy, maybe).
>>>> Thoughts?
>>> Performance tests like this are fraught with complicating details (such as, 
>>> did you fully fetch each column in each row in both cases?  Did you have 
>>> equivalent unicode and numeric conversions in place in both tests ? ).   In 
>>> this case psycopg2 is written in pure C and SQLAlchemy's result proxy only 
>>> partially (did you use the C extensions ?).    You'd use the Python 
>>> profiling module to get a clear picture for what difference there is in 
>>> effort.   But using any kind of abstraction layer, especially one written 
>>> in Python, will always add latency versus a pure C program.
>> I pretty much did this:
>> for row in rows:
>>  count += 1
> That test is probably flawed, as you don't fetch actual values. You
> should try to access individual elements (either by iterating over the
> row, or indexing it one way or another -- the speed difference can
> vary quite a bit depending on that). You might get even worse results
> with a proper test though ;-).

Revised to use:

for row in rows:
  dict(row) # throw away result
  count += 1

SQLAlchemy: 115,000 to 120,000 rows/s (vs. psycopg2 @ 480K - 580K, or
psycopg2 COPY @ 620K).

I suspect the issue is that I'm only selecting one column, so the
per-row overhead is exaggerated.

Thanks for the responses.

Strange things are afoot at the Circle K.

You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to