On Nov 5, 8:40 pm, Michael Bayer <mike...@zzzcomputing.com> wrote: > On Nov 5, 2009, at 8:46 PM, Jon Nelson wrote: > > > > > > > I recently ran into an issue today where batched (inside a > > transaction) I was able to achieve not more than about 9000 > > inserts/second to a postgresql database (same machine running the > > test). > > > With everything exactly the same, I was able to achieve over 50,000 > > inserts/s to sqlite. > > > Now, I won't argue the relative merits of each database, but this is a > > big problem for postgresql. I believe I have determined that the > > psycopg2 module is to blame, and the substantial portion of the time > > spent was being spent in IPC/RPC. Basically, every single insert in > > this test is identical except for the values (same table and columns), > > but psycopg2 (or possibly SQLAlchemy) was performing an individual > > INSERT for every single row. I was *not* using the ORM. > > > The code was something like this: > > > row_values = build_a_bunch_of_dictionaries() > > ins = table.insert() > > t = conn.begin() > > conn.execute(ins, row_values) > > t.commit() > > > where row_values is (of course) a list of dictionaries. > > > What can be done here to improve the speed of bulk inserts? For > > postgresql to get walloped by a factor of 5 in this area is a big > > bummer. > > it depends on the source of the speed problem. if your table has > types which do utf-8 encoding on each value, for example, that takes > up a lot of time. the sqlite backend doesn't have this requirement > but the PG one in 0.5 currently does. > > we've done some work on this in 0.6 to reduce this - we now use > psycopg2's UNICODE extension, so that we expect result rows to come > back as unicode objects already. In response to this question I just > made the same change for bind parameters so that they wont be encoded > into utf-8 on the way in, so feel free to try r6484 of trunk.
I gave that a try and did receive a mild speed boost - from ~9000 inserts/s to 9500 +/- 200. However, 9500 is still substantially lower than 50,000. In this case (pathological), *all* of the values are strings, and in fact the table doesn't even have a primary key. > Also psycopg2 is a very fast, native DBAPI so I doubt there's any > bottleneck there. Granted, I'm using SA on /top/ of sqlite3 and psycopg2 (2.0.12), but when the only thing that changes is the dburi... Before I even posted I resorted to strace. strace immediately confirmed my suspicion - when using psycopg2 I don't see one big fat INSERT with lots of binds, I see one INSERT per bind, and it's this that is ultimately killing the performance. You can easily observe this via strace: as I'm sure you know, the communication between the test program and postgresql takes place across a socket (unix domain or tcp/ip). For every single set of bind params, the result is essentially one sendto (INSERT INTO ....) and rt_sigprocmask, a poll, and then a recvfrom and rt_sigprocmask pair. Profiling at the C level shows that sendto accounts for *35%* of the total runtime and recvfrom a healthy 15%. It's this enormous overhead for every single bind param that's killing the performance. -- Jon --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---