Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

Michael Bayer Tue, 27 Aug 2013 10:14:16 -0700

when you have concurrent transactions going on, the database should be 
responsible for dealing with whatever locking is required.   the optimistic 
approach i have outlined doesn't require any locking, but you could also do 
this using an explicit table lock - the syntax and availability of explicit 
table locks is backend-specific.





On Aug 27, 2013, at 12:09 PM, herzaso <herz...@gmail.com> wrote:

> Thanks Michael,
> I will try that and let you know if it solved my issue.
> BTW: Is there a lock mechanism for such conditions?
> 
> Thanks,
> Ofir
> 
> 
> On Tuesday, August 27, 2013 6:40:03 PM UTC+3, Michael Bayer wrote:
> I'm not a fan of catching integrity errors, i prefer to try to make sure they 
> aren't going to happen, or if they are, they aren't a normal occurrence and 
> the system is such that the particular operation can just fail (of course it 
> depends on what it is).     A problem with catching the integrity error due 
> to concurrent, conflicting operations is that depending on backend and 
> isolation level, you can't be totally sure when the error is going to get 
> raised (e.g. serializable isolation vs. non).  Also on a backend like 
> Postgresql, the database can't recover the transaction after an integrity 
> error unless you used a savepoint.
> 
> But here you're doing the "concurrent transactions need row identity X", so 
> maybe it is appropriate here.  Here is a rough idea of a transactional 
> pattern for that, noting this isn't tested:
> 
> try:
>     my_object = Session.query(MyClass).filter(....).one()
> except NoResultFound:
>     try:
>         with Session.begin_nested():   # run inside a SAVEPOINT
>             my_object = MyClass(...)
>             Session.add(my_object)
>             Session.flush()
>     except IntegrityError:
>         my_object = Session.query(MyClass).filter(....).one()
> 
> 
> 
> 
> 
> On Aug 27, 2013, at 11:13 AM, herzaso <her...@gmail.com> wrote:
> 
>> Suppose we are looking at a race condition, do you also think this should be 
>> handled by catching the IntegrityError?
>> If so, what should I do? only flush and do the operation again?
>> 
>> On Tuesday, August 27, 2013 5:42:23 PM UTC+3, Michael Bayer wrote:
>> the word "occasional" is very meaningful.  It usually suggests race 
>> conditions.    Then with the word "tornado", the baysean filters are 
>> strongly leaning towards "race condition" at that point :).
>> 
>> if an error is occurring only under volume then you have to revisit where 
>> race conditions can occur.
>> 
>> On Aug 27, 2013, at 10:32 AM, herzaso <her...@gmail.com> wrote:
>> 
>>> I'm running a Tornado server without redundancy (only one process, requests 
>>> can arrive at the same time but will be handled one at a time)
>>> I do agree that for large volumes, catching the IntegrityError would be 
>>> better, but currently I am handling a single request at a time and I want 
>>> to fix this problem before I move on ...
>>> 
>>> 
>>> On Tuesday, August 27, 2013 5:24:07 PM UTC+3, Simon King wrote:
>>> On Tue, Aug 27, 2013 at 2:31 PM, herzaso <her...@gmail.com> wrote: 
>>> > On Tuesday, August 27, 2013 3:55:50 PM UTC+3, Simon King wrote: 
>>> >> 
>>> >> On Tue, Aug 27, 2013 at 1:40 PM, herzaso <her...@gmail.com> wrote: 
>>> >> > I have a model with an ID column set as the primary key, though i'd 
>>> >> > like 
>>> >> > to 
>>> >> > be able to identify records by 3 other columns. 
>>> >> > For this situation, I've added a classmethod that will fetch the 
>>> >> > record 
>>> >> > if 
>>> >> > found or a new record if not. 
>>> >> > The problem i'm having is that every once in a while, I get 
>>> >> > IntegrityError 
>>> >> > trying to flush a change 
>>> >> > 
>>> >> > class Foo(Base): 
>>> >> >     __table_args__ = (sa.UniqueConstraint('bar', 'baz', 'qux'),) 
>>> >> > 
>>> >> >     id = sa.Column(Identifier, sa.Sequence('%s_id_seq' % 
>>> >> > __tablename__), 
>>> >> > nullable=False, primary_key=True) 
>>> >> >     bar = sa.Column(sa.BigInteger) 
>>> >> >     baz = sa.Column(sa.BigInteger) 
>>> >> >     qux = sa.Column(sa.BigInteger) 
>>> >> >     a1 = sa.Column(sa.BigInteger) 
>>> >> >     a2 = sa.Column(sa.BigInteger) 
>>> >> > 
>>> >> >     @classmethod 
>>> >> >     def get(cls, bar=None, baz=None, qux=None, **kwargs): 
>>> >> >         item = session.query(cls).\ 
>>> >> >             filter(cls.bar== bar).\ 
>>> >> >             filter(cls.baz == baz).\ 
>>> >> >             filter(cls.qux == qux).\ 
>>> >> >             first() 
>>> >> > 
>>> >> >         if item: 
>>> >> >             for k, v in kwargs.iteritems(): 
>>> >> >                 if getattr(item, k) != v: 
>>> >> >                     setattr(item, k, v) 
>>> >> >         else: 
>>> >> >             item = cls(bar=bar, baz=baz, qux=qux, **kwargs) 
>>> >> > 
>>> >> >         return item 
>>> >> > 
>>> >> > This is the code I use to add/update records: 
>>> >> > 
>>> >> > foo = Foo.get(**item) 
>>> >> > session.merge(foo) 
>>> >> > 
>>> >> > I'm struggling with this problem for some time now, and would 
>>> >> > appreciate 
>>> >> > any 
>>> >> > help ... 
>>> >> > 
>>> >> 
>>> >> I'm not sure of the exact problem, but there are a couple of things 
>>> >> that you could investigate. 
>>> >> 
>>> >> Firstly, session.merge returns a copy of the object, rather than 
>>> >> adding the object that you supplied into the session. See 
>>> >> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#merging for 
>>> >> details. 
>>> >> 
>>> >> Secondly, your "get" method sometimes returns objects that are already 
>>> >> part of the session (if they were in the database), and sometimes 
>>> >> objects that are not in the session. It would probably be more 
>>> >> consistent to always return objects that are part of the session, by 
>>> >> putting "session.add(item)" in your "else" clause. This would get rid 
>>> >> of the need for session.merge(). (If you want to be able to use the 
>>> >> "get" with non-global sessions, pass the session as a parameter.) 
>>> >> 
>>> >> Finally, if your session isn't auto-flushing, it would be possible for 
>>> >> you to call "get" twice with the same parameters and get 2 different 
>>> >> objects back. 
>>> >> 
>>> >> You may want to look at the UniqueObject recipe in the wiki: 
>>> >> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject 
>>> >> 
>>> > Hi Simon, 
>>> > Thanks for the fast reply. 
>>> > 
>>> > I tried adding session.add(item) and session.flush() in the else clause 
>>> > in 
>>> > the past but that didn't solve my problem. 
>>> > I didn't however remove the merge, do you think that might be the 
>>> > problem? 
>>> > 
>>> > Regarding the flush, this code is part of an API server where a 
>>> > scoped_session is committed after each change. I haven't changed the 
>>> > autoflush parameter, and as I understand the default value is True making 
>>> > a 
>>> > flush before each commit or query. 
>>> > 
>>> > As for the UniqueObject recipe, thanks! Amazing that I never found it 
>>> > searching for a cure. As I see it basically does the same ... 
>>> > 
>>> > I never managed to reproduce this bug on my development environment. It 
>>> > only 
>>> > happens in my production environment. 
>>> > Do you suppose adding a session.add and removing the merge will solve 
>>> > this 
>>> > issue? 
>>> > 
>>> > Thanks, 
>>> > Ofir 
>>> 
>>> It's difficult to say without knowing more about your system. For 
>>> example, does your production system get multiple concurrent API 
>>> requests, or are they serialised? If 2 requests can come in at 
>>> approximately the same time and are handled by 2 different threads (or 
>>> processes), then it is easy to imagine that the first handler will 
>>> check the database, find that an entry doesn't exist, and create it. 
>>> But before it flushes the change to the database (or even after it 
>>> flushes, but before it commits, depending on your transaction 
>>> isolation), the second handler will check for the same object, find it 
>>> missing, and so create it. 
>>> 
>>> To track down problems like this, you could ensure that your 
>>> development environment has the same thread/process behaviour as the 
>>> production environment, then try submitting multiple concurrent 
>>> requests to it. If you add "time.sleep" statements somewhere between 
>>> the creation of the object and the commit of the transaction you will 
>>> probably find it easier to trigger. 
>>> 
>>> To actually fix the problem, you could choose to only handle a single 
>>> request at a time (fine if you don't expect a high volume of 
>>> requests). If that's not acceptable, you could catch the 
>>> IntegrityError and then re-process the request. 
>>> 
>>> Hope that helps, 
>>> 
>>> Simon 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to sqlalchemy+...@googlegroups.com.
>>> To post to this group, send email to sqlal...@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to sqlalchemy+...@googlegroups.com.
>> To post to this group, send email to sqlal...@googlegroups.com.
>> Visit this group at http://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

Reply via email to