Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

herzaso Mon, 02 Sep 2013 05:40:43 -0700

I'm sorry, it was a misunderstanding on my part regarding the transactions.
So what are you saying? that I should replace the transaction isolation 
level?


On Monday, September 2, 2013 3:29:25 PM UTC+3, Simon King wrote:
>
> What exactly do you mean by not using transactions? The Session always 
> works within a transaction: 
>
>   
> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#managing-transactions 
>
> I assume you are also using InnoDB tables. 
>
> On Mon, Sep 2, 2013 at 1:19 PM, herzaso <her...@gmail.com <javascript:>> 
> wrote: 
> > I do have it set as REPEATABLE READ. 
> > However, I don't use transactions in sqlalchemy 
> > 
> > 
> > On Monday, September 2, 2013 3:08:58 PM UTC+3, Simon King wrote: 
> >> 
> >> Do you know what transaction isolation level you are running at? The 
> >> default apparently is "REPEATABLE READ": 
> >> 
> >> 
> >> 
> http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html#isolevel_repeatable-read
>  
> >> 
> >> The important sentence in that link is: 
> >> 
> >>   All consistent reads within the same transaction read the snapshot 
> >> established by the first read 
> >> 
> >> When you query the database for the first time, to see if the entity 
> >> already exists, you are setting that initial snapshot. If you run the 
> >> same query again (such as in your exception handler), you will get the 
> >> same results, whether or not another connection has inserted a 
> >> matching row in the meantime. 
> >> 
> >> Simon 
> >> 
> >> On Mon, Sep 2, 2013 at 12:54 PM, herzaso <her...@gmail.com> wrote: 
> >> > I'm not sure what to make of the results: 
> >> > On the first connection, I ran BEGIN and INSERT and both were 
> >> > successful, 
> >> > but when I tried the INSERT statement on the second connection, I got 
> >> > "ERROR 
> >> > 1205 (HY000): Lock wait timeout exceeded; try restarting 
> transaction". 
> >> > Running the same query on the first connection produced the required 
> >> > result 
> >> > which is "ERROR 1062 (23000): Duplicate entry" 
> >> > After the ROLLBACK on the first connection, the INSERT statement 
> worked 
> >> > well 
> >> > on the second connection 
> >> > 
> >> > Regarding your second remark, the answer is yes, the error was due to 
> >> > the 
> >> > unique constraint on those columns 
> >> > 
> >> > BTW: I'm working on MySQL 
> >> > 
> >> > On Monday, September 2, 2013 1:31:12 PM UTC+3, Simon King wrote: 
> >> >> 
> >> >> I don't really know the answer, but I'd be interested in the results 
> >> >> of this experiment: 
> >> >> 
> >> >> Forget about SQLAlchemy for the moment, and start 2 plain SQL 
> >> >> connections to your database. In the first, type something like the 
> >> >> following: 
> >> >> 
> >> >> BEGIN; 
> >> >> INSERT foo(bar, baz, qux) VALUES(1, 1, 1); 
> >> >> 
> >> >> Now in the second connection do the same. I assume it'll fail 
> because 
> >> >> of the duplicate values. 
> >> >> 
> >> >> Now in the first connection issue a "ROLLBACK". You should now be in 
> a 
> >> >> state where no matching row exists in the database, even though you 
> >> >> received an error about constraint violations. 
> >> >> 
> >> >> The results you see may be different, depending on your transaction 
> >> >> isolation level. (It may be that you don't get the constraint 
> >> >> violation at all until you try to commit the second connection). 
> >> >> 
> >> >> Another thing you could look at: are you sure that the error you are 
> >> >> getting is due to the unique constraint on bar/baz/qux, and not some 
> >> >> other constraint in the database? 
> >> >> 
> >> >> Simon 
> >> >> 
> >> >> On Mon, Sep 2, 2013 at 8:45 AM, herzaso <her...@gmail.com> wrote: 
> >> >> > I'm afraid it didn't solve my problem. 
> >> >> > 
> >> >> > Here is my updated method: 
> >> >> >     @classmethod 
> >> >> >     def get(cls, bar=None, baz=None, qux=None, **kwargs): 
> >> >> >         query = session.query(cls).\ 
> >> >> >             filter(cls.bar == bar).\ 
> >> >> >             filter(cls.baz == baz).\ 
> >> >> >             filter(cls.qux == qux) 
> >> >> > 
> >> >> >         item = query.first() 
> >> >> >         updated = False 
> >> >> > 
> >> >> >         if not item: 
> >> >> >             try: 
> >> >> >                 with session.begin_nested():   # run inside a 
> >> >> > SAVEPOINT 
> >> >> >                     updated = True 
> >> >> >                     item = cls(bar=bar, baz=baz, qux=qux, 
> **kwargs) 
> >> >> >                     session.add(item) 
> >> >> >                     session.flush() 
> >> >> >             except sa.exc.IntegrityError: 
> >> >> >                 item = query.first() 
> >> >> >                 if not item: 
> >> >> >                     raise Exception("invalidIntegrityError") 
> >> >> >             except: 
> >> >> >                 raise 
> >> >> > 
> >> >> >         if not updated: 
> >> >> >             for k, v in kwargs.iteritems(): 
> >> >> >                 if getattr(item, k) != v: 
> >> >> >                     setattr(item, k, v) 
> >> >> > 
> >> >> >         return item 
> >> >> > 
> >> >> > With this code, i'm getting invalidIntegrityError. How is it 
> >> >> > possible? 
> >> >> > (it's also worth pointing out that this solution requires SA 0.8.2 
> >> >> > (otherwise, there is a problem with session.begin_nested) 
> >> >> > 
> >> >> > 
> >> >> > On Tuesday, August 27, 2013 6:40:03 PM UTC+3, Michael Bayer wrote: 
> >> >> >> 
> >> >> >> I'm not a fan of catching integrity errors, i prefer to try to 
> make 
> >> >> >> sure 
> >> >> >> they aren't going to happen, or if they are, they aren't a normal 
> >> >> >> occurrence 
> >> >> >> and the system is such that the particular operation can just 
> fail 
> >> >> >> (of 
> >> >> >> course it depends on what it is).     A problem with catching the 
> >> >> >> integrity 
> >> >> >> error due to concurrent, conflicting operations is that depending 
> on 
> >> >> >> backend 
> >> >> >> and isolation level, you can't be totally sure when the error is 
> >> >> >> going 
> >> >> >> to 
> >> >> >> get raised (e.g. serializable isolation vs. non).  Also on a 
> backend 
> >> >> >> like 
> >> >> >> Postgresql, the database can't recover the transaction after an 
> >> >> >> integrity 
> >> >> >> error unless you used a savepoint. 
> >> >> >> 
> >> >> >> But here you're doing the "concurrent transactions need row 
> identity 
> >> >> >> X", 
> >> >> >> so maybe it is appropriate here.  Here is a rough idea of a 
> >> >> >> transactional 
> >> >> >> pattern for that, noting this isn't tested: 
> >> >> >> 
> >> >> >> try: 
> >> >> >>     my_object = Session.query(MyClass).filter(....).one() 
> >> >> >> except NoResultFound: 
> >> >> >>     try: 
> >> >> >>         with Session.begin_nested():   # run inside a SAVEPOINT 
> >> >> >>             my_object = MyClass(...) 
> >> >> >>             Session.add(my_object) 
> >> >> >>             Session.flush() 
> >> >> >>     except IntegrityError: 
> >> >> >>         my_object = Session.query(MyClass).filter(....).one() 
> >> >> >> 
> >> >> >> 
> >> >> >> 
> >> >> >> 
> >> >> >> 
> >> >> >> On Aug 27, 2013, at 11:13 AM, herzaso <her...@gmail.com> wrote: 
> >> >> >> 
> >> >> >> Suppose we are looking at a race condition, do you also think 
> this 
> >> >> >> should 
> >> >> >> be handled by catching the IntegrityError? 
> >> >> >> If so, what should I do? only flush and do the operation again? 
> >> >> >> 
> >> >> >> On Tuesday, August 27, 2013 5:42:23 PM UTC+3, Michael Bayer 
> wrote: 
> >> >> >>> 
> >> >> >>> the word "occasional" is very meaningful.  It usually suggests 
> race 
> >> >> >>> conditions.    Then with the word "tornado", the baysean filters 
> >> >> >>> are 
> >> >> >>> strongly leaning towards "race condition" at that point :). 
> >> >> >>> 
> >> >> >>> if an error is occurring only under volume then you have to 
> revisit 
> >> >> >>> where 
> >> >> >>> race conditions can occur. 
> >> >> >>> 
> >> >> >>> On Aug 27, 2013, at 10:32 AM, herzaso <her...@gmail.com> wrote: 
> >> >> >>> 
> >> >> >>> I'm running a Tornado server without redundancy (only one 
> process, 
> >> >> >>> requests can arrive at the same time but will be handled one at 
> a 
> >> >> >>> time) 
> >> >> >>> I do agree that for large volumes, catching the IntegrityError 
> >> >> >>> would 
> >> >> >>> be 
> >> >> >>> better, but currently I am handling a single request at a time 
> and 
> >> >> >>> I 
> >> >> >>> want to 
> >> >> >>> fix this problem before I move on ... 
> >> >> >>> 
> >> >> >>> 
> >> >> >>> On Tuesday, August 27, 2013 5:24:07 PM UTC+3, Simon King wrote: 
> >> >> >>>> 
> >> >> >>>> On Tue, Aug 27, 2013 at 2:31 PM, herzaso <her...@gmail.com> 
> wrote: 
> >> >> >>>> > On Tuesday, August 27, 2013 3:55:50 PM UTC+3, Simon King 
> wrote: 
> >> >> >>>> >> 
> >> >> >>>> >> On Tue, Aug 27, 2013 at 1:40 PM, herzaso <her...@gmail.com> 
> >> >> >>>> >> wrote: 
> >> >> >>>> >> > I have a model with an ID column set as the primary key, 
> >> >> >>>> >> > though 
> >> >> >>>> >> > i'd 
> >> >> >>>> >> > like 
> >> >> >>>> >> > to 
> >> >> >>>> >> > be able to identify records by 3 other columns. 
> >> >> >>>> >> > For this situation, I've added a classmethod that will 
> fetch 
> >> >> >>>> >> > the 
> >> >> >>>> >> > record 
> >> >> >>>> >> > if 
> >> >> >>>> >> > found or a new record if not. 
> >> >> >>>> >> > The problem i'm having is that every once in a while, I 
> get 
> >> >> >>>> >> > IntegrityError 
> >> >> >>>> >> > trying to flush a change 
> >> >> >>>> >> > 
> >> >> >>>> >> > class Foo(Base): 
> >> >> >>>> >> >     __table_args__ = (sa.UniqueConstraint('bar', 'baz', 
> >> >> >>>> >> > 'qux'),) 
> >> >> >>>> >> > 
> >> >> >>>> >> >     id = sa.Column(Identifier, sa.Sequence('%s_id_seq' % 
> >> >> >>>> >> > __tablename__), 
> >> >> >>>> >> > nullable=False, primary_key=True) 
> >> >> >>>> >> >     bar = sa.Column(sa.BigInteger) 
> >> >> >>>> >> >     baz = sa.Column(sa.BigInteger) 
> >> >> >>>> >> >     qux = sa.Column(sa.BigInteger) 
> >> >> >>>> >> >     a1 = sa.Column(sa.BigInteger) 
> >> >> >>>> >> >     a2 = sa.Column(sa.BigInteger) 
> >> >> >>>> >> > 
> >> >> >>>> >> >     @classmethod 
> >> >> >>>> >> >     def get(cls, bar=None, baz=None, qux=None, **kwargs): 
> >> >> >>>> >> >         item = session.query(cls).\ 
> >> >> >>>> >> >             filter(cls.bar== bar).\ 
> >> >> >>>> >> >             filter(cls.baz == baz).\ 
> >> >> >>>> >> >             filter(cls.qux == qux).\ 
> >> >> >>>> >> >             first() 
> >> >> >>>> >> > 
> >> >> >>>> >> >         if item: 
> >> >> >>>> >> >             for k, v in kwargs.iteritems(): 
> >> >> >>>> >> >                 if getattr(item, k) != v: 
> >> >> >>>> >> >                     setattr(item, k, v) 
> >> >> >>>> >> >         else: 
> >> >> >>>> >> >             item = cls(bar=bar, baz=baz, qux=qux, 
> **kwargs) 
> >> >> >>>> >> > 
> >> >> >>>> >> >         return item 
> >> >> >>>> >> > 
> >> >> >>>> >> > This is the code I use to add/update records: 
> >> >> >>>> >> > 
> >> >> >>>> >> > foo = Foo.get(**item) 
> >> >> >>>> >> > session.merge(foo) 
> >> >> >>>> >> > 
> >> >> >>>> >> > I'm struggling with this problem for some time now, and 
> would 
> >> >> >>>> >> > appreciate 
> >> >> >>>> >> > any 
> >> >> >>>> >> > help ... 
> >> >> >>>> >> > 
> >> >> >>>> >> 
> >> >> >>>> >> I'm not sure of the exact problem, but there are a couple of 
> >> >> >>>> >> things 
> >> >> >>>> >> that you could investigate. 
> >> >> >>>> >> 
> >> >> >>>> >> Firstly, session.merge returns a copy of the object, rather 
> >> >> >>>> >> than 
> >> >> >>>> >> adding the object that you supplied into the session. See 
> >> >> >>>> >> 
> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#merging 
> >> >> >>>> >> for 
> >> >> >>>> >> details. 
> >> >> >>>> >> 
> >> >> >>>> >> Secondly, your "get" method sometimes returns objects that 
> are 
> >> >> >>>> >> already 
> >> >> >>>> >> part of the session (if they were in the database), and 
> >> >> >>>> >> sometimes 
> >> >> >>>> >> objects that are not in the session. It would probably be 
> more 
> >> >> >>>> >> consistent to always return objects that are part of the 
> >> >> >>>> >> session, 
> >> >> >>>> >> by 
> >> >> >>>> >> putting "session.add(item)" in your "else" clause. This 
> would 
> >> >> >>>> >> get 
> >> >> >>>> >> rid 
> >> >> >>>> >> of the need for session.merge(). (If you want to be able to 
> use 
> >> >> >>>> >> the 
> >> >> >>>> >> "get" with non-global sessions, pass the session as a 
> >> >> >>>> >> parameter.) 
> >> >> >>>> >> 
> >> >> >>>> >> Finally, if your session isn't auto-flushing, it would be 
> >> >> >>>> >> possible 
> >> >> >>>> >> for 
> >> >> >>>> >> you to call "get" twice with the same parameters and get 2 
> >> >> >>>> >> different 
> >> >> >>>> >> objects back. 
> >> >> >>>> >> 
> >> >> >>>> >> You may want to look at the UniqueObject recipe in the wiki: 
> >> >> >>>> >> 
> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject 
> >> >> >>>> >> 
> >> >> >>>> > Hi Simon, 
> >> >> >>>> > Thanks for the fast reply. 
> >> >> >>>> > 
> >> >> >>>> > I tried adding session.add(item) and session.flush() in the 
> else 
> >> >> >>>> > clause in 
> >> >> >>>> > the past but that didn't solve my problem. 
> >> >> >>>> > I didn't however remove the merge, do you think that might be 
> >> >> >>>> > the 
> >> >> >>>> > problem? 
> >> >> >>>> > 
> >> >> >>>> > Regarding the flush, this code is part of an API server where 
> a 
> >> >> >>>> > scoped_session is committed after each change. I haven't 
> changed 
> >> >> >>>> > the 
> >> >> >>>> > autoflush parameter, and as I understand the default value is 
> >> >> >>>> > True 
> >> >> >>>> > making a 
> >> >> >>>> > flush before each commit or query. 
> >> >> >>>> > 
> >> >> >>>> > As for the UniqueObject recipe, thanks! Amazing that I never 
> >> >> >>>> > found 
> >> >> >>>> > it 
> >> >> >>>> > searching for a cure. As I see it basically does the same ... 
> >> >> >>>> > 
> >> >> >>>> > I never managed to reproduce this bug on my development 
> >> >> >>>> > environment. 
> >> >> >>>> > It only 
> >> >> >>>> > happens in my production environment. 
> >> >> >>>> > Do you suppose adding a session.add and removing the merge 
> will 
> >> >> >>>> > solve 
> >> >> >>>> > this 
> >> >> >>>> > issue? 
> >> >> >>>> > 
> >> >> >>>> > Thanks, 
> >> >> >>>> > Ofir 
> >> >> >>>> 
> >> >> >>>> It's difficult to say without knowing more about your system. 
> For 
> >> >> >>>> example, does your production system get multiple concurrent 
> API 
> >> >> >>>> requests, or are they serialised? If 2 requests can come in at 
> >> >> >>>> approximately the same time and are handled by 2 different 
> threads 
> >> >> >>>> (or 
> >> >> >>>> processes), then it is easy to imagine that the first handler 
> will 
> >> >> >>>> check the database, find that an entry doesn't exist, and 
> create 
> >> >> >>>> it. 
> >> >> >>>> But before it flushes the change to the database (or even after 
> it 
> >> >> >>>> flushes, but before it commits, depending on your transaction 
> >> >> >>>> isolation), the second handler will check for the same object, 
> >> >> >>>> find 
> >> >> >>>> it 
> >> >> >>>> missing, and so create it. 
> >> >> >>>> 
> >> >> >>>> To track down problems like this, you could ensure that your 
> >> >> >>>> development environment has the same thread/process behaviour 
> as 
> >> >> >>>> the 
> >> >> >>>> production environment, then try submitting multiple concurrent 
> >> >> >>>> requests to it. If you add "time.sleep" statements somewhere 
> >> >> >>>> between 
> >> >> >>>> the creation of the object and the commit of the transaction 
> you 
> >> >> >>>> will 
> >> >> >>>> probably find it easier to trigger. 
> >> >> >>>> 
> >> >> >>>> To actually fix the problem, you could choose to only handle a 
> >> >> >>>> single 
> >> >> >>>> request at a time (fine if you don't expect a high volume of 
> >> >> >>>> requests). If that's not acceptable, you could catch the 
> >> >> >>>> IntegrityError and then re-process the request. 
> >> >> >>>> 
> >> >> >>>> Hope that helps, 
> >> >> >>>> 
> >> >> >>>> Simon 
> >> >> >>> 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to sqlalchemy+...@googlegroups.com <javascript:>. 
> > To post to this group, send email to 
> > sqlal...@googlegroups.com<javascript:>. 
>
> > Visit this group at http://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/groups/opt_out. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

Reply via email to