On Fri, Dec 16, 2011 at 3:30 AM, Gaëtan de Menten <gdemen...@gmail.com> wrote:
> On Thu, Dec 15, 2011 at 19:52, Jon Nelson <jnel...@jamponi.net> wrote:
>> On Thu, Dec 15, 2011 at 12:01 PM, Michael Bayer
>> <mike...@zzzcomputing.com> wrote:
>>>
>>> On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote:
>>>
>>>> Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4
>>>> with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64.
>>>>
>>>> I did some performance testing. Selecting 75 million rows (a straight
>>>> up SELECT colA from tableA) from a 5GB table yielded some interesting
>>>> results.
>>>> psycopg2 averaged between 485,000 and 585,000 rows per second.
>>>> Using COPY (via psycopg2) the average was right around 585,000.
>>>> sqlalchemy averaged between 160,000 and 190,000 rows per second.
>>>> That's a pretty big difference.
>
> Weird, IIRC, SA was much closer than raw psycopg2 (without using
> COPY), in the range of SA adding a 50% overhead, not a 200% overhead.
>
>>>> I briefly looked into what the cause could be, but I didn't see
>>>> anything jump out at me (except RowProxy, maybe).
>>>> Thoughts?
>>>
>>> Performance tests like this are fraught with complicating details (such as, 
>>> did you fully fetch each column in each row in both cases?  Did you have 
>>> equivalent unicode and numeric conversions in place in both tests ? ).   In 
>>> this case psycopg2 is written in pure C and SQLAlchemy's result proxy only 
>>> partially (did you use the C extensions ?).    You'd use the Python 
>>> profiling module to get a clear picture for what difference there is in 
>>> effort.   But using any kind of abstraction layer, especially one written 
>>> in Python, will always add latency versus a pure C program.
>>
>> I pretty much did this:
>> for row in rows:
>>  count += 1
>
> That test is probably flawed, as you don't fetch actual values. You
> should try to access individual elements (either by iterating over the
> row, or indexing it one way or another -- the speed difference can
> vary quite a bit depending on that). You might get even worse results
> with a proper test though ;-).

Revised to use:

for row in rows:
  dict(row) # throw away result
  count += 1

SQLAlchemy: 115,000 to 120,000 rows/s (vs. psycopg2 @ 480K - 580K, or
psycopg2 COPY @ 620K).

I suspect the issue is that I'm only selecting one column, so the
per-row overhead is exaggerated.

Thanks for the responses.

-- 
Strange things are afoot at the Circle K.
Jon

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to