On 6/17/15 4:05 AM, Soeren Medard wrote:
Hi,
I have had an issue with bulk operations arising from the following use case: I was looping an iterator of mappings, which are either already in the DB (inserts) or not (updates). As I did not want to loop over it twice and wanted to do only on transaction, I used greenlets do split the iterator and concurrently run `bulk_insert_mappings` and `bulk_update_mappings` on the same session object.

that won't work, because it means you are running multiple greenlets against a single database connection, and that is not thread/greenlet safe assuming you are using a greenlet-monkeypatched DBAPI. you need to synchronize the work of each greenlet, or use individual connections.

additionally, as the stack trace you've passed on indicates, the bulk methods on Session still make use of a subtransaction internally, which is stateful. So that's the specific error you're seeing, multiple greenlets are competing for the state of the session.transaction object which is a linked list of transaction nests. It absolutely is not threadsafe (which means greenlets also). The Session doesn't have much need to support concurrent threads/greenlets doing things on it without locking because the DBAPI connections it refers to are virtually never safe in to use this way in any case.



So, questions related:
1) Is there indeed a better way to run both bulk operations while looping over an iterator only once?
2) Should this be considered a bug?

Versions used:
|
SQLAlchemy==1.0.5
greenlet==0.4.7
psycopg2==2.6.1
|
and Postgres 9.4.1, Python 3.4.3

So the async support for psycopg2 is ultimately using a single Postgresql Connection with async=1, this is documented at http://initd.org/psycopg/docs/advanced.html#asynchronous-support. We can see here that this use is not supported: " Two cursors can’t execute concurrent queries on the same asynchronous connection."

So your options are to either synchronize the work of the multiple greenlets, which will pretty much eliminate any point to doing it that way, or to use a connection/session per greenlet. At the end of the day you're communicating on a TCP socket where an INSERT/UPDATE string is sent along the wire and a response is being waited for, and multiple statements cannot be simultaneously multiplexed on a single connection.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to