If you haven't already, you should read http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#when-do-i-construct-a-session-when-do-i-commit-it-and-when-do-i-close-it
On Mon, Sep 2, 2013 at 2:45 PM, herzaso <herz...@gmail.com> wrote: > Damn, I hoped for an easier solution ... :) > > I will give it a try with a session.close to see if it helps (although I > think I had complaints from users running the same API several times - and > each time, at the end of my REST APIs I run session.close) > > > On Monday, September 2, 2013 3:58:02 PM UTC+3, Simon King wrote: >> >> I'm no expert on isolation levels - I was just trying to help you >> understand what the problem was :-) >> >> Fixing it really depends on how your application is supposed to work >> in the face of concurrent requests. For this specific part of the >> application, you probably want to be able to see the Foo object that >> was created by the other transaction. Reducing the transaction >> isolation is probably the easiest way to do that, but might have >> knock-on effects in your application, so you ought to think carefully >> before doing it. >> >> The alternative is to discard the existing session state when you get >> into this situation (via session.close) and start a new transaction. >> However, it wouldn't be appropriate to do this inside your "get" >> method - "session lifecycle" operations like this really belong at an >> outer scope, so making a change like this may require a certain amount >> of restructuring. >> >> Basically, dealing with concurrent operations is hard, and SQLAlchemy >> isn't going to magically make it any easier I'm afraid. >> >> Simon >> >> On Mon, Sep 2, 2013 at 1:40 PM, herzaso <her...@gmail.com> wrote: >> > I'm sorry, it was a misunderstanding on my part regarding the >> > transactions. >> > So what are you saying? that I should replace the transaction isolation >> > level? >> > >> > >> > On Monday, September 2, 2013 3:29:25 PM UTC+3, Simon King wrote: >> >> >> >> What exactly do you mean by not using transactions? The Session always >> >> works within a transaction: >> >> >> >> >> >> >> >> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#managing-transactions >> >> >> >> I assume you are also using InnoDB tables. >> >> >> >> On Mon, Sep 2, 2013 at 1:19 PM, herzaso <her...@gmail.com> wrote: >> >> > I do have it set as REPEATABLE READ. >> >> > However, I don't use transactions in sqlalchemy >> >> > >> >> > >> >> > On Monday, September 2, 2013 3:08:58 PM UTC+3, Simon King wrote: >> >> >> >> >> >> Do you know what transaction isolation level you are running at? The >> >> >> default apparently is "REPEATABLE READ": >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html#isolevel_repeatable-read >> >> >> >> >> >> The important sentence in that link is: >> >> >> >> >> >> All consistent reads within the same transaction read the snapshot >> >> >> established by the first read >> >> >> >> >> >> When you query the database for the first time, to see if the entity >> >> >> already exists, you are setting that initial snapshot. If you run >> >> >> the >> >> >> same query again (such as in your exception handler), you will get >> >> >> the >> >> >> same results, whether or not another connection has inserted a >> >> >> matching row in the meantime. >> >> >> >> >> >> Simon >> >> >> >> >> >> On Mon, Sep 2, 2013 at 12:54 PM, herzaso <her...@gmail.com> wrote: >> >> >> > I'm not sure what to make of the results: >> >> >> > On the first connection, I ran BEGIN and INSERT and both were >> >> >> > successful, >> >> >> > but when I tried the INSERT statement on the second connection, I >> >> >> > got >> >> >> > "ERROR >> >> >> > 1205 (HY000): Lock wait timeout exceeded; try restarting >> >> >> > transaction". >> >> >> > Running the same query on the first connection produced the >> >> >> > required >> >> >> > result >> >> >> > which is "ERROR 1062 (23000): Duplicate entry" >> >> >> > After the ROLLBACK on the first connection, the INSERT statement >> >> >> > worked >> >> >> > well >> >> >> > on the second connection >> >> >> > >> >> >> > Regarding your second remark, the answer is yes, the error was due >> >> >> > to >> >> >> > the >> >> >> > unique constraint on those columns >> >> >> > >> >> >> > BTW: I'm working on MySQL >> >> >> > >> >> >> > On Monday, September 2, 2013 1:31:12 PM UTC+3, Simon King wrote: >> >> >> >> >> >> >> >> I don't really know the answer, but I'd be interested in the >> >> >> >> results >> >> >> >> of this experiment: >> >> >> >> >> >> >> >> Forget about SQLAlchemy for the moment, and start 2 plain SQL >> >> >> >> connections to your database. In the first, type something like >> >> >> >> the >> >> >> >> following: >> >> >> >> >> >> >> >> BEGIN; >> >> >> >> INSERT foo(bar, baz, qux) VALUES(1, 1, 1); >> >> >> >> >> >> >> >> Now in the second connection do the same. I assume it'll fail >> >> >> >> because >> >> >> >> of the duplicate values. >> >> >> >> >> >> >> >> Now in the first connection issue a "ROLLBACK". You should now be >> >> >> >> in >> >> >> >> a >> >> >> >> state where no matching row exists in the database, even though >> >> >> >> you >> >> >> >> received an error about constraint violations. >> >> >> >> >> >> >> >> The results you see may be different, depending on your >> >> >> >> transaction >> >> >> >> isolation level. (It may be that you don't get the constraint >> >> >> >> violation at all until you try to commit the second connection). >> >> >> >> >> >> >> >> Another thing you could look at: are you sure that the error you >> >> >> >> are >> >> >> >> getting is due to the unique constraint on bar/baz/qux, and not >> >> >> >> some >> >> >> >> other constraint in the database? >> >> >> >> >> >> >> >> Simon >> >> >> >> >> >> >> >> On Mon, Sep 2, 2013 at 8:45 AM, herzaso <her...@gmail.com> wrote: >> >> >> >> > I'm afraid it didn't solve my problem. >> >> >> >> > >> >> >> >> > Here is my updated method: >> >> >> >> > @classmethod >> >> >> >> > def get(cls, bar=None, baz=None, qux=None, **kwargs): >> >> >> >> > query = session.query(cls).\ >> >> >> >> > filter(cls.bar == bar).\ >> >> >> >> > filter(cls.baz == baz).\ >> >> >> >> > filter(cls.qux == qux) >> >> >> >> > >> >> >> >> > item = query.first() >> >> >> >> > updated = False >> >> >> >> > >> >> >> >> > if not item: >> >> >> >> > try: >> >> >> >> > with session.begin_nested(): # run inside a >> >> >> >> > SAVEPOINT >> >> >> >> > updated = True >> >> >> >> > item = cls(bar=bar, baz=baz, qux=qux, >> >> >> >> > **kwargs) >> >> >> >> > session.add(item) >> >> >> >> > session.flush() >> >> >> >> > except sa.exc.IntegrityError: >> >> >> >> > item = query.first() >> >> >> >> > if not item: >> >> >> >> > raise Exception("invalidIntegrityError") >> >> >> >> > except: >> >> >> >> > raise >> >> >> >> > >> >> >> >> > if not updated: >> >> >> >> > for k, v in kwargs.iteritems(): >> >> >> >> > if getattr(item, k) != v: >> >> >> >> > setattr(item, k, v) >> >> >> >> > >> >> >> >> > return item >> >> >> >> > >> >> >> >> > With this code, i'm getting invalidIntegrityError. How is it >> >> >> >> > possible? >> >> >> >> > (it's also worth pointing out that this solution requires SA >> >> >> >> > 0.8.2 >> >> >> >> > (otherwise, there is a problem with session.begin_nested) >> >> >> >> > >> >> >> >> > >> >> >> >> > On Tuesday, August 27, 2013 6:40:03 PM UTC+3, Michael Bayer >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> I'm not a fan of catching integrity errors, i prefer to try to >> >> >> >> >> make >> >> >> >> >> sure >> >> >> >> >> they aren't going to happen, or if they are, they aren't a >> >> >> >> >> normal >> >> >> >> >> occurrence >> >> >> >> >> and the system is such that the particular operation can just >> >> >> >> >> fail >> >> >> >> >> (of >> >> >> >> >> course it depends on what it is). A problem with catching >> >> >> >> >> the >> >> >> >> >> integrity >> >> >> >> >> error due to concurrent, conflicting operations is that >> >> >> >> >> depending >> >> >> >> >> on >> >> >> >> >> backend >> >> >> >> >> and isolation level, you can't be totally sure when the error >> >> >> >> >> is >> >> >> >> >> going >> >> >> >> >> to >> >> >> >> >> get raised (e.g. serializable isolation vs. non). Also on a >> >> >> >> >> backend >> >> >> >> >> like >> >> >> >> >> Postgresql, the database can't recover the transaction after >> >> >> >> >> an >> >> >> >> >> integrity >> >> >> >> >> error unless you used a savepoint. >> >> >> >> >> >> >> >> >> >> But here you're doing the "concurrent transactions need row >> >> >> >> >> identity >> >> >> >> >> X", >> >> >> >> >> so maybe it is appropriate here. Here is a rough idea of a >> >> >> >> >> transactional >> >> >> >> >> pattern for that, noting this isn't tested: >> >> >> >> >> >> >> >> >> >> try: >> >> >> >> >> my_object = Session.query(MyClass).filter(....).one() >> >> >> >> >> except NoResultFound: >> >> >> >> >> try: >> >> >> >> >> with Session.begin_nested(): # run inside a >> >> >> >> >> SAVEPOINT >> >> >> >> >> my_object = MyClass(...) >> >> >> >> >> Session.add(my_object) >> >> >> >> >> Session.flush() >> >> >> >> >> except IntegrityError: >> >> >> >> >> my_object = Session.query(MyClass).filter(....).one() >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Aug 27, 2013, at 11:13 AM, herzaso <her...@gmail.com> >> >> >> >> >> wrote: >> >> >> >> >> >> >> >> >> >> Suppose we are looking at a race condition, do you also think >> >> >> >> >> this >> >> >> >> >> should >> >> >> >> >> be handled by catching the IntegrityError? >> >> >> >> >> If so, what should I do? only flush and do the operation >> >> >> >> >> again? >> >> >> >> >> >> >> >> >> >> On Tuesday, August 27, 2013 5:42:23 PM UTC+3, Michael Bayer >> >> >> >> >> wrote: >> >> >> >> >>> >> >> >> >> >>> the word "occasional" is very meaningful. It usually >> >> >> >> >>> suggests >> >> >> >> >>> race >> >> >> >> >>> conditions. Then with the word "tornado", the baysean >> >> >> >> >>> filters >> >> >> >> >>> are >> >> >> >> >>> strongly leaning towards "race condition" at that point :). >> >> >> >> >>> >> >> >> >> >>> if an error is occurring only under volume then you have to >> >> >> >> >>> revisit >> >> >> >> >>> where >> >> >> >> >>> race conditions can occur. >> >> >> >> >>> >> >> >> >> >>> On Aug 27, 2013, at 10:32 AM, herzaso <her...@gmail.com> >> >> >> >> >>> wrote: >> >> >> >> >>> >> >> >> >> >>> I'm running a Tornado server without redundancy (only one >> >> >> >> >>> process, >> >> >> >> >>> requests can arrive at the same time but will be handled one >> >> >> >> >>> at >> >> >> >> >>> a >> >> >> >> >>> time) >> >> >> >> >>> I do agree that for large volumes, catching the >> >> >> >> >>> IntegrityError >> >> >> >> >>> would >> >> >> >> >>> be >> >> >> >> >>> better, but currently I am handling a single request at a >> >> >> >> >>> time >> >> >> >> >>> and >> >> >> >> >>> I >> >> >> >> >>> want to >> >> >> >> >>> fix this problem before I move on ... >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> On Tuesday, August 27, 2013 5:24:07 PM UTC+3, Simon King >> >> >> >> >>> wrote: >> >> >> >> >>>> >> >> >> >> >>>> On Tue, Aug 27, 2013 at 2:31 PM, herzaso <her...@gmail.com> >> >> >> >> >>>> wrote: >> >> >> >> >>>> > On Tuesday, August 27, 2013 3:55:50 PM UTC+3, Simon King >> >> >> >> >>>> > wrote: >> >> >> >> >>>> >> >> >> >> >> >>>> >> On Tue, Aug 27, 2013 at 1:40 PM, herzaso >> >> >> >> >>>> >> <her...@gmail.com> >> >> >> >> >>>> >> wrote: >> >> >> >> >>>> >> > I have a model with an ID column set as the primary >> >> >> >> >>>> >> > key, >> >> >> >> >>>> >> > though >> >> >> >> >>>> >> > i'd >> >> >> >> >>>> >> > like >> >> >> >> >>>> >> > to >> >> >> >> >>>> >> > be able to identify records by 3 other columns. >> >> >> >> >>>> >> > For this situation, I've added a classmethod that will >> >> >> >> >>>> >> > fetch >> >> >> >> >>>> >> > the >> >> >> >> >>>> >> > record >> >> >> >> >>>> >> > if >> >> >> >> >>>> >> > found or a new record if not. >> >> >> >> >>>> >> > The problem i'm having is that every once in a while, I >> >> >> >> >>>> >> > get >> >> >> >> >>>> >> > IntegrityError >> >> >> >> >>>> >> > trying to flush a change >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > class Foo(Base): >> >> >> >> >>>> >> > __table_args__ = (sa.UniqueConstraint('bar', 'baz', >> >> >> >> >>>> >> > 'qux'),) >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > id = sa.Column(Identifier, sa.Sequence('%s_id_seq' >> >> >> >> >>>> >> > % >> >> >> >> >>>> >> > __tablename__), >> >> >> >> >>>> >> > nullable=False, primary_key=True) >> >> >> >> >>>> >> > bar = sa.Column(sa.BigInteger) >> >> >> >> >>>> >> > baz = sa.Column(sa.BigInteger) >> >> >> >> >>>> >> > qux = sa.Column(sa.BigInteger) >> >> >> >> >>>> >> > a1 = sa.Column(sa.BigInteger) >> >> >> >> >>>> >> > a2 = sa.Column(sa.BigInteger) >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > @classmethod >> >> >> >> >>>> >> > def get(cls, bar=None, baz=None, qux=None, >> >> >> >> >>>> >> > **kwargs): >> >> >> >> >>>> >> > item = session.query(cls).\ >> >> >> >> >>>> >> > filter(cls.bar== bar).\ >> >> >> >> >>>> >> > filter(cls.baz == baz).\ >> >> >> >> >>>> >> > filter(cls.qux == qux).\ >> >> >> >> >>>> >> > first() >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > if item: >> >> >> >> >>>> >> > for k, v in kwargs.iteritems(): >> >> >> >> >>>> >> > if getattr(item, k) != v: >> >> >> >> >>>> >> > setattr(item, k, v) >> >> >> >> >>>> >> > else: >> >> >> >> >>>> >> > item = cls(bar=bar, baz=baz, qux=qux, >> >> >> >> >>>> >> > **kwargs) >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > return item >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > This is the code I use to add/update records: >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > foo = Foo.get(**item) >> >> >> >> >>>> >> > session.merge(foo) >> >> >> >> >>>> >> > >> >> >> >> >>>> >> > I'm struggling with this problem for some time now, and >> >> >> >> >>>> >> > would >> >> >> >> >>>> >> > appreciate >> >> >> >> >>>> >> > any >> >> >> >> >>>> >> > help ... >> >> >> >> >>>> >> > >> >> >> >> >>>> >> >> >> >> >> >>>> >> I'm not sure of the exact problem, but there are a couple >> >> >> >> >>>> >> of >> >> >> >> >>>> >> things >> >> >> >> >>>> >> that you could investigate. >> >> >> >> >>>> >> >> >> >> >> >>>> >> Firstly, session.merge returns a copy of the object, >> >> >> >> >>>> >> rather >> >> >> >> >>>> >> than >> >> >> >> >>>> >> adding the object that you supplied into the session. See >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#merging >> >> >> >> >>>> >> for >> >> >> >> >>>> >> details. >> >> >> >> >>>> >> >> >> >> >> >>>> >> Secondly, your "get" method sometimes returns objects >> >> >> >> >>>> >> that >> >> >> >> >>>> >> are >> >> >> >> >>>> >> already >> >> >> >> >>>> >> part of the session (if they were in the database), and >> >> >> >> >>>> >> sometimes >> >> >> >> >>>> >> objects that are not in the session. It would probably be >> >> >> >> >>>> >> more >> >> >> >> >>>> >> consistent to always return objects that are part of the >> >> >> >> >>>> >> session, >> >> >> >> >>>> >> by >> >> >> >> >>>> >> putting "session.add(item)" in your "else" clause. This >> >> >> >> >>>> >> would >> >> >> >> >>>> >> get >> >> >> >> >>>> >> rid >> >> >> >> >>>> >> of the need for session.merge(). (If you want to be able >> >> >> >> >>>> >> to >> >> >> >> >>>> >> use >> >> >> >> >>>> >> the >> >> >> >> >>>> >> "get" with non-global sessions, pass the session as a >> >> >> >> >>>> >> parameter.) >> >> >> >> >>>> >> >> >> >> >> >>>> >> Finally, if your session isn't auto-flushing, it would be >> >> >> >> >>>> >> possible >> >> >> >> >>>> >> for >> >> >> >> >>>> >> you to call "get" twice with the same parameters and get >> >> >> >> >>>> >> 2 >> >> >> >> >>>> >> different >> >> >> >> >>>> >> objects back. >> >> >> >> >>>> >> >> >> >> >> >>>> >> You may want to look at the UniqueObject recipe in the >> >> >> >> >>>> >> wiki: >> >> >> >> >>>> >> >> >> >> >> >>>> >> >> >> >> >> >>>> >> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject >> >> >> >> >>>> >> >> >> >> >> >>>> > Hi Simon, >> >> >> >> >>>> > Thanks for the fast reply. >> >> >> >> >>>> > >> >> >> >> >>>> > I tried adding session.add(item) and session.flush() in >> >> >> >> >>>> > the >> >> >> >> >>>> > else >> >> >> >> >>>> > clause in >> >> >> >> >>>> > the past but that didn't solve my problem. >> >> >> >> >>>> > I didn't however remove the merge, do you think that might >> >> >> >> >>>> > be >> >> >> >> >>>> > the >> >> >> >> >>>> > problem? >> >> >> >> >>>> > >> >> >> >> >>>> > Regarding the flush, this code is part of an API server >> >> >> >> >>>> > where >> >> >> >> >>>> > a >> >> >> >> >>>> > scoped_session is committed after each change. I haven't >> >> >> >> >>>> > changed >> >> >> >> >>>> > the >> >> >> >> >>>> > autoflush parameter, and as I understand the default value >> >> >> >> >>>> > is >> >> >> >> >>>> > True >> >> >> >> >>>> > making a >> >> >> >> >>>> > flush before each commit or query. >> >> >> >> >>>> > >> >> >> >> >>>> > As for the UniqueObject recipe, thanks! Amazing that I >> >> >> >> >>>> > never >> >> >> >> >>>> > found >> >> >> >> >>>> > it >> >> >> >> >>>> > searching for a cure. As I see it basically does the same >> >> >> >> >>>> > ... >> >> >> >> >>>> > >> >> >> >> >>>> > I never managed to reproduce this bug on my development >> >> >> >> >>>> > environment. >> >> >> >> >>>> > It only >> >> >> >> >>>> > happens in my production environment. >> >> >> >> >>>> > Do you suppose adding a session.add and removing the merge >> >> >> >> >>>> > will >> >> >> >> >>>> > solve >> >> >> >> >>>> > this >> >> >> >> >>>> > issue? >> >> >> >> >>>> > >> >> >> >> >>>> > Thanks, >> >> >> >> >>>> > Ofir >> >> >> >> >>>> >> >> >> >> >>>> It's difficult to say without knowing more about your >> >> >> >> >>>> system. >> >> >> >> >>>> For >> >> >> >> >>>> example, does your production system get multiple concurrent >> >> >> >> >>>> API >> >> >> >> >>>> requests, or are they serialised? If 2 requests can come in >> >> >> >> >>>> at >> >> >> >> >>>> approximately the same time and are handled by 2 different >> >> >> >> >>>> threads >> >> >> >> >>>> (or >> >> >> >> >>>> processes), then it is easy to imagine that the first >> >> >> >> >>>> handler >> >> >> >> >>>> will >> >> >> >> >>>> check the database, find that an entry doesn't exist, and >> >> >> >> >>>> create >> >> >> >> >>>> it. >> >> >> >> >>>> But before it flushes the change to the database (or even >> >> >> >> >>>> after >> >> >> >> >>>> it >> >> >> >> >>>> flushes, but before it commits, depending on your >> >> >> >> >>>> transaction >> >> >> >> >>>> isolation), the second handler will check for the same >> >> >> >> >>>> object, >> >> >> >> >>>> find >> >> >> >> >>>> it >> >> >> >> >>>> missing, and so create it. >> >> >> >> >>>> >> >> >> >> >>>> To track down problems like this, you could ensure that your >> >> >> >> >>>> development environment has the same thread/process >> >> >> >> >>>> behaviour >> >> >> >> >>>> as >> >> >> >> >>>> the >> >> >> >> >>>> production environment, then try submitting multiple >> >> >> >> >>>> concurrent >> >> >> >> >>>> requests to it. If you add "time.sleep" statements somewhere >> >> >> >> >>>> between >> >> >> >> >>>> the creation of the object and the commit of the transaction >> >> >> >> >>>> you >> >> >> >> >>>> will >> >> >> >> >>>> probably find it easier to trigger. >> >> >> >> >>>> >> >> >> >> >>>> To actually fix the problem, you could choose to only handle >> >> >> >> >>>> a >> >> >> >> >>>> single >> >> >> >> >>>> request at a time (fine if you don't expect a high volume of >> >> >> >> >>>> requests). If that's not acceptable, you could catch the >> >> >> >> >>>> IntegrityError and then re-process the request. >> >> >> >> >>>> >> >> >> >> >>>> Hope that helps, >> >> >> >> >>>> >> >> >> >> >>>> Simon >> >> >> >> >>> -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/groups/opt_out.