On Nov 5, 8:40 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Nov 5, 2009, at 8:46 PM, Jon Nelson wrote:
>
>
>
>
>
> > I recently ran into an issue today where batched (inside a
> > transaction) I was able to achieve not more than about 9000
> > inserts/second to a postgresql database (same machine running the
> > test).
>
> > With everything exactly the same, I was able to achieve over 50,000
> > inserts/s to sqlite.
>
> > Now, I won't argue the relative merits of each database, but this is a
> > big problem for postgresql. I believe I have determined that the
> > psycopg2 module is to blame, and the substantial portion of the time
> > spent was being spent in IPC/RPC. Basically, every single insert in
> > this test is identical except for the values (same table and columns),
> > but psycopg2 (or possibly SQLAlchemy) was performing an individual
> > INSERT for every single row. I was *not* using the ORM.
>
> > The code was something like this:
>
> > row_values = build_a_bunch_of_dictionaries()
> > ins = table.insert()
> > t = conn.begin()
> > conn.execute(ins, row_values)
> > t.commit()
>
> > where row_values is (of course) a list of dictionaries.
>
> > What can be done here to improve the speed of bulk inserts? For
> > postgresql to get walloped by a factor of 5 in this area is a big
> > bummer.
>
> it depends on the source of the speed problem.   if your table has
> types which do utf-8 encoding on each value, for example, that takes
> up a lot of time.  the sqlite backend doesn't have this requirement
> but the PG one in 0.5 currently does.
>
> we've done some work on this in 0.6 to reduce this - we now use
> psycopg2's UNICODE extension, so that we expect result rows to come
> back as unicode objects already.   In response to this question I just
> made the same change for bind parameters so that they wont be encoded
> into utf-8 on the way in, so feel free to try r6484 of trunk.

I gave that a try and did receive a mild speed boost - from ~9000
inserts/s to 9500 +/- 200.
However, 9500 is still substantially lower than 50,000.  In this case
(pathological), *all* of the values are strings, and in fact the table
doesn't even have a primary key.

> Also psycopg2 is a very fast, native DBAPI so I doubt there's any
> bottleneck there.

Granted, I'm using SA on /top/ of sqlite3 and psycopg2 (2.0.12), but
when the only thing that changes is the dburi...

Before I even posted I resorted to strace. strace immediately
confirmed my suspicion - when using psycopg2 I don't see one big fat
INSERT with lots of binds, I see one INSERT per bind, and it's this
that is ultimately killing the performance. You can easily observe
this via strace: as I'm sure you know, the communication between the
test program and postgresql takes place across a socket (unix domain
or tcp/ip). For every single set of bind params, the result is
essentially one sendto (INSERT INTO ....) and rt_sigprocmask, a poll,
and then a recvfrom and rt_sigprocmask pair.  Profiling at the C level
shows that sendto accounts for *35%* of the total runtime and recvfrom
a healthy 15%. It's this enormous overhead for every single bind param
that's killing the performance.

--
Jon

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to