Re: [sqlalchemy] Practical ways for dealing with concurrency problems?

Michael Bayer Wed, 10 Nov 2010 06:52:44 -0800

Hi Russell -

There's a lot going on here, so I'll point you to a few things.



1. the "nested sets" example, which also has the need to do "update table set 
x=x+1" in order to make room for new entries, thats in the distro at 
examples/nested_sets/nested_sets.py .    That example uses mapper extensions 
and stuff for a very specific use case which is probably more elaborate than 
what you have here, the main thing you're looking at there are the UPDATE 
statements.    You definitely want to be using a single UPDATE to increment all 
the subsequent rows, however.

2. with_lockmode() is a gateway to "SELECT..FOR UPDATE".   If you are selecting 
the rows first before updating them, this is a way to "lock" them, and it does 
in effect do "read" locking against other similar transactions, since they 
would be calling "SELECT..FOR UPDATE" as well and therefore would be locked 
out.   PG's docs have the best explanation: 
http://www.postgresql.org/docs/8.1/static/sql-select.html#SQL-FOR-UPDATE-SHARE

3. you need to be mindful of transaction isolation mode, and the implications 
thereof.   READ_COMMITTED and SERIALIZABLE are common.   Take a look at MySQL's 
and PG's docs on this to get a feel for them. SQLA allows this to be set via a 
parameter to create_engine().

4. when you use the ORM and the Session, you're in a transaction.   If you want 
to work with straight SQL statements in that transaction, you just use the 
execute() method.  
http://www.sqlalchemy.org/docs/orm/session.html#using-sql-expressions-with-sessions

hope this helps you get started.



On Nov 9, 2010, at 11:51 PM, Russell Warren wrote:

> I've got several questions about dealing with concurrency issues in
> SQLAlchemy.  To try and help with the asking, I'll use a small example
> similar to a situation I've got in an SQLAlchemy application.  In this
> application, there is a table that tracks objects and their order/
> position, something like below...
> 
> Name   Pos
> ----   ---
> ObjA    1
> ObjB    2
> ObjC    3
> 
> And this can change to the following with a new object inserted
> "before" (position-wise) ObjB...
> 
> Name   Pos
> ----   ---
> ObjA    1
> ObjB    3
> ObjC    4
> NewObj  2
> 
> Some sample code is whipped up to do this is at the bottom of this
> post.
> 
> Worth noting is that my application is currently database-agnostic
> (SQLAlchemy enabling this is a great feature), and I'd like to keep it
> that way if at all possible.
> 
> My questions revolve around concurrency issues because the insert
> operation can potentially update all rows in the table.  It seems that
> this is very vulnerable to concurrency problems since each transaction
> will only update its own rows that existed at the start of the
> transaction.  If a parallel transaction is started doing a similar
> insertion, a record can be potentially get a duplicate position
> number, which would destroy the application logic.
> 
> I'm much more of an application person than a DB person, so to solve
> this my immediate thinking is to simply do it all at the application
> layer and use a thread lock on the table updating function.  All table
> access is routed through the application (which uses scoped sessions)
> so this would certainly "work", but I'm certain that those more
> familiar with sqla and the DB layer would likely cringe at that
> approach.
> 
> I really want to learn more about dealing with concurrency at the
> database level through sqlalchemy.  What is a good way to deal with
> this in sqla?
> 
> One possibility seems to be to get it all done in one atomic commit.
> But how can I use the ORM (together with the expression language?) to
> increment by 1 all positions where the position value is > than the
> position for a given object name?  For a single value, the docs show
> me that can get the proper SQL emitted for a record with something
> like:
> 
> obj.position = obj.__table__.c.position + 1
> 
> where obj is one of my mapped objects.  But how do I do it for
> multiple rows and not just the one obj?  Also - is going through
> __table__ the most direct way for achieving this?
> 
> After the application layer blocking, the next most brutal thing that
> comes to mind is locking the entire table during the operation to
> avoid problems.  I'm not clear this will work since it seems a read
> lock is needed, and I don't yet know if you can actually do that.  It
> may be irrelevant, since I certainly do not see any way of doing table
> locks through sqla at the moment. If locking were an option, it seems
> it would require direct (potentialy DB specifc) SQL statement
> execution through session.execute() or conn.execute().  Is that
> correct?
> 
> Something that I've seen referenced a few times for dealing with
> concurrency in sqlalchemy is to use "with_lockmode()" on a query to
> lock the relevant rows.  However - I'm wary of this for a few
> reasons:  First is that I don't think it would work in my example case
> because read blocking is needed (I think).  Second is that I can't
> actually figure out what with_lockmode actually does.  Digging in the
> code I see the lock options are "read", "update", "update_nowait".
> Short of finding this in query.py, where are the docs for these
> options??  The third bit of wariness is uncertain support for
> different dialects.  Digging through my 'SQL in a Nutshell' I see
> varying DB support for "SELECT FOR UPDATE" syntax (which is what
> with_lockmode seems to employ?) across various RDMS so I'd rather stay
> away from it if possible.  What is the best way to figure out the
> database dialect support for each of these options, short of trying
> with all potential database types?
> 
> Although I'm certainly interested in specific ideas on dealing with
> something similar to the table update described, any links or tips
> anyone has for dealing with concurrency in SQLAlchemy would be
> appreciated.  This includes any expansions anyone has on how the UOW
> pattern and ORM transactions alone help with some forms of
> concurrency, as I know they do to some extent.
> 
> Thanks,
> Russ
> 
> ## sample code with concurrency issue...
> 
> from sqlalchemy.ext.declarative import declarative_base
> from sqlalchemy import Table, Column, Integer, String, MetaData
> from sqlalchemy import create_engine
> from pprint import pprint
> engine = create_engine('sqlite:///:memory:', echo = False)
> metadata = MetaData()
> 
> from sqlalchemy.orm import sessionmaker
> Session = sessionmaker(bind = engine)
> 
> Base = declarative_base()
> class OrderedObject(Base):
>    __tablename__ = 'objects'
> 
>    id = Column(Integer, primary_key = True)
>    name = Column(String)
>    position = Column(Integer)
> 
>    def __init__(self, name, name_to_bump = ""):
>        self._ormSess = Session() #does embedding this scare anyone?
>        self.name = name
>        if name_to_bump:
>            self._InsertBefore(name_to_bump)
>        else:
>            #Count below may be inaccurate with concurrent users!
>            newPos = self._ormSess.query(OrderedObject).count() + 1
>            self.position = newPos
>        self._ormSess.add(self)
>        self._ormSess.commit()
> 
>    def _InsertBefore(self, name_to_bump):
>        #This (potentially) modifies all rows in the table.  Need to
> watch out
>        #for concurrency issues...
>        self._ormSess = Session()
>        allObjects = self._ormSess.query(OrderedObject).all()
>        pos = 0
>        for obj in allObjects:
>            pos += 1
>            if obj.name == name_to_bump:
>                self.position = pos
>                pos += 1
>            obj.position = pos
> 
>    def __repr__(self):
>        return "<Object('%s' at %d)>" % (self.name, self.position)
> 
> Base.metadata.create_all(engine)
> 
> sess = Session()
> initialObjects = [OrderedObject("Object_%s" % chr(65 + i)) for i in
> range(5)]
> print "Initial Objects..."
> print "------------------"
> pprint(sess.query(OrderedObject).order_by(OrderedObject.position).all())
> 
> OrderedObject("BEFORE_B", "Object_B")
> print "\nBumped Objects..."
> print "------------------"
> pprint(sess.query(OrderedObject).order_by(OrderedObject.position).all())
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalch...@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalch...@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Practical ways for dealing with concurrency problems?

Reply via email to