On Fri, Dec 16, 2011 at 3:30 AM, Gaëtan de Menten <gdemen...@gmail.com> wrote: > On Thu, Dec 15, 2011 at 19:52, Jon Nelson <jnel...@jamponi.net> wrote: >> On Thu, Dec 15, 2011 at 12:01 PM, Michael Bayer >> <mike...@zzzcomputing.com> wrote: >>> >>> On Dec 15, 2011, at 12:51 PM, Jon Nelson wrote: >>> >>>> Up front, I'm not using the ORM at all, and I'm using SQLAlchemy 0.7.4 >>>> with psycopg2 2.4.3 on PostgreSQL 8.4.10 on Linux x86_64. >>>> >>>> I did some performance testing. Selecting 75 million rows (a straight >>>> up SELECT colA from tableA) from a 5GB table yielded some interesting >>>> results. >>>> psycopg2 averaged between 485,000 and 585,000 rows per second. >>>> Using COPY (via psycopg2) the average was right around 585,000. >>>> sqlalchemy averaged between 160,000 and 190,000 rows per second. >>>> That's a pretty big difference. > > Weird, IIRC, SA was much closer than raw psycopg2 (without using > COPY), in the range of SA adding a 50% overhead, not a 200% overhead. > >>>> I briefly looked into what the cause could be, but I didn't see >>>> anything jump out at me (except RowProxy, maybe). >>>> Thoughts? >>> >>> Performance tests like this are fraught with complicating details (such as, >>> did you fully fetch each column in each row in both cases? Did you have >>> equivalent unicode and numeric conversions in place in both tests ? ). In >>> this case psycopg2 is written in pure C and SQLAlchemy's result proxy only >>> partially (did you use the C extensions ?). You'd use the Python >>> profiling module to get a clear picture for what difference there is in >>> effort. But using any kind of abstraction layer, especially one written >>> in Python, will always add latency versus a pure C program. >> >> I pretty much did this: >> for row in rows: >> count += 1 > > That test is probably flawed, as you don't fetch actual values. You > should try to access individual elements (either by iterating over the > row, or indexing it one way or another -- the speed difference can > vary quite a bit depending on that). You might get even worse results > with a proper test though ;-).
Revised to use: for row in rows: dict(row) # throw away result count += 1 SQLAlchemy: 115,000 to 120,000 rows/s (vs. psycopg2 @ 480K - 580K, or psycopg2 COPY @ 620K). I suspect the issue is that I'm only selecting one column, so the per-row overhead is exaggerated. Thanks for the responses. -- Strange things are afoot at the Circle K. Jon -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.