Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

Michael Bayer Tue, 27 Aug 2013 08:41:30 -0700

I'm not a fan of catching integrity errors, i prefer to try to make sure they 
aren't going to happen, or if they are, they aren't a normal occurrence and the 
system is such that the particular operation can just fail (of course it 
depends on what it is).     A problem with catching the integrity error due to 
concurrent, conflicting operations is that depending on backend and isolation 
level, you can't be totally sure when the error is going to get raised (e.g. 
serializable isolation vs. non).  Also on a backend like Postgresql, the 
database can't recover the transaction after an integrity error unless you used 
a savepoint.


But here you're doing the "concurrent transactions need row identity X", so 
maybe it is appropriate here.  Here is a rough idea of a transactional pattern 
for that, noting this isn't tested:

try:
    my_object = Session.query(MyClass).filter(....).one()
except NoResultFound:
    try:
        with Session.begin_nested():   # run inside a SAVEPOINT
            my_object = MyClass(...)
            Session.add(my_object)
            Session.flush()
    except IntegrityError:
        my_object = Session.query(MyClass).filter(....).one()





On Aug 27, 2013, at 11:13 AM, herzaso <herz...@gmail.com> wrote:

> Suppose we are looking at a race condition, do you also think this should be 
> handled by catching the IntegrityError?
> If so, what should I do? only flush and do the operation again?
> 
> On Tuesday, August 27, 2013 5:42:23 PM UTC+3, Michael Bayer wrote:
> the word "occasional" is very meaningful.  It usually suggests race 
> conditions.    Then with the word "tornado", the baysean filters are strongly 
> leaning towards "race condition" at that point :).
> 
> if an error is occurring only under volume then you have to revisit where 
> race conditions can occur.
> 
> On Aug 27, 2013, at 10:32 AM, herzaso <her...@gmail.com> wrote:
> 
>> I'm running a Tornado server without redundancy (only one process, requests 
>> can arrive at the same time but will be handled one at a time)
>> I do agree that for large volumes, catching the IntegrityError would be 
>> better, but currently I am handling a single request at a time and I want to 
>> fix this problem before I move on ...
>> 
>> 
>> On Tuesday, August 27, 2013 5:24:07 PM UTC+3, Simon King wrote:
>> On Tue, Aug 27, 2013 at 2:31 PM, herzaso <her...@gmail.com> wrote: 
>> > On Tuesday, August 27, 2013 3:55:50 PM UTC+3, Simon King wrote: 
>> >> 
>> >> On Tue, Aug 27, 2013 at 1:40 PM, herzaso <her...@gmail.com> wrote: 
>> >> > I have a model with an ID column set as the primary key, though i'd 
>> >> > like 
>> >> > to 
>> >> > be able to identify records by 3 other columns. 
>> >> > For this situation, I've added a classmethod that will fetch the record 
>> >> > if 
>> >> > found or a new record if not. 
>> >> > The problem i'm having is that every once in a while, I get 
>> >> > IntegrityError 
>> >> > trying to flush a change 
>> >> > 
>> >> > class Foo(Base): 
>> >> >     __table_args__ = (sa.UniqueConstraint('bar', 'baz', 'qux'),) 
>> >> > 
>> >> >     id = sa.Column(Identifier, sa.Sequence('%s_id_seq' % 
>> >> > __tablename__), 
>> >> > nullable=False, primary_key=True) 
>> >> >     bar = sa.Column(sa.BigInteger) 
>> >> >     baz = sa.Column(sa.BigInteger) 
>> >> >     qux = sa.Column(sa.BigInteger) 
>> >> >     a1 = sa.Column(sa.BigInteger) 
>> >> >     a2 = sa.Column(sa.BigInteger) 
>> >> > 
>> >> >     @classmethod 
>> >> >     def get(cls, bar=None, baz=None, qux=None, **kwargs): 
>> >> >         item = session.query(cls).\ 
>> >> >             filter(cls.bar== bar).\ 
>> >> >             filter(cls.baz == baz).\ 
>> >> >             filter(cls.qux == qux).\ 
>> >> >             first() 
>> >> > 
>> >> >         if item: 
>> >> >             for k, v in kwargs.iteritems(): 
>> >> >                 if getattr(item, k) != v: 
>> >> >                     setattr(item, k, v) 
>> >> >         else: 
>> >> >             item = cls(bar=bar, baz=baz, qux=qux, **kwargs) 
>> >> > 
>> >> >         return item 
>> >> > 
>> >> > This is the code I use to add/update records: 
>> >> > 
>> >> > foo = Foo.get(**item) 
>> >> > session.merge(foo) 
>> >> > 
>> >> > I'm struggling with this problem for some time now, and would 
>> >> > appreciate 
>> >> > any 
>> >> > help ... 
>> >> > 
>> >> 
>> >> I'm not sure of the exact problem, but there are a couple of things 
>> >> that you could investigate. 
>> >> 
>> >> Firstly, session.merge returns a copy of the object, rather than 
>> >> adding the object that you supplied into the session. See 
>> >> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#merging for 
>> >> details. 
>> >> 
>> >> Secondly, your "get" method sometimes returns objects that are already 
>> >> part of the session (if they were in the database), and sometimes 
>> >> objects that are not in the session. It would probably be more 
>> >> consistent to always return objects that are part of the session, by 
>> >> putting "session.add(item)" in your "else" clause. This would get rid 
>> >> of the need for session.merge(). (If you want to be able to use the 
>> >> "get" with non-global sessions, pass the session as a parameter.) 
>> >> 
>> >> Finally, if your session isn't auto-flushing, it would be possible for 
>> >> you to call "get" twice with the same parameters and get 2 different 
>> >> objects back. 
>> >> 
>> >> You may want to look at the UniqueObject recipe in the wiki: 
>> >> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject 
>> >> 
>> > Hi Simon, 
>> > Thanks for the fast reply. 
>> > 
>> > I tried adding session.add(item) and session.flush() in the else clause in 
>> > the past but that didn't solve my problem. 
>> > I didn't however remove the merge, do you think that might be the problem? 
>> > 
>> > Regarding the flush, this code is part of an API server where a 
>> > scoped_session is committed after each change. I haven't changed the 
>> > autoflush parameter, and as I understand the default value is True making 
>> > a 
>> > flush before each commit or query. 
>> > 
>> > As for the UniqueObject recipe, thanks! Amazing that I never found it 
>> > searching for a cure. As I see it basically does the same ... 
>> > 
>> > I never managed to reproduce this bug on my development environment. It 
>> > only 
>> > happens in my production environment. 
>> > Do you suppose adding a session.add and removing the merge will solve this 
>> > issue? 
>> > 
>> > Thanks, 
>> > Ofir 
>> 
>> It's difficult to say without knowing more about your system. For 
>> example, does your production system get multiple concurrent API 
>> requests, or are they serialised? If 2 requests can come in at 
>> approximately the same time and are handled by 2 different threads (or 
>> processes), then it is easy to imagine that the first handler will 
>> check the database, find that an entry doesn't exist, and create it. 
>> But before it flushes the change to the database (or even after it 
>> flushes, but before it commits, depending on your transaction 
>> isolation), the second handler will check for the same object, find it 
>> missing, and so create it. 
>> 
>> To track down problems like this, you could ensure that your 
>> development environment has the same thread/process behaviour as the 
>> production environment, then try submitting multiple concurrent 
>> requests to it. If you add "time.sleep" statements somewhere between 
>> the creation of the object and the commit of the transaction you will 
>> probably find it easier to trigger. 
>> 
>> To actually fix the problem, you could choose to only handle a single 
>> request at a time (fine if you don't expect a high volume of 
>> requests). If that's not acceptable, you could catch the 
>> IntegrityError and then re-process the request. 
>> 
>> Hope that helps, 
>> 
>> Simon 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to sqlalchemy+...@googlegroups.com.
>> To post to this group, send email to sqlal...@googlegroups.com.
>> Visit this group at http://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

Reply via email to