Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

herzaso Tue, 27 Aug 2013 09:11:31 -0700

Thanks Michael,
I will try that and let you know if it solved my issue.
BTW: Is there a lock mechanism for such conditions?


Thanks,
Ofir


On Tuesday, August 27, 2013 6:40:03 PM UTC+3, Michael Bayer wrote:
>
> I'm not a fan of catching integrity errors, i prefer to try to make sure 
> they aren't going to happen, or if they are, they aren't a normal 
> occurrence and the system is such that the particular operation can just 
> fail (of course it depends on what it is).     A problem with catching the 
> integrity error due to concurrent, conflicting operations is that depending 
> on backend and isolation level, you can't be totally sure when the error is 
> going to get raised (e.g. serializable isolation vs. non).  Also on a 
> backend like Postgresql, the database can't recover the transaction after 
> an integrity error unless you used a savepoint.
>
> But here you're doing the "concurrent transactions need row identity X", 
> so maybe it is appropriate here.  Here is a rough idea of a transactional 
> pattern for that, noting this isn't tested:
>
> try:
>     my_object = Session.query(MyClass).filter(....).one()
> except NoResultFound:
>     try:
>         with Session.begin_nested():   # run inside a SAVEPOINT
>             my_object = MyClass(...)
>             Session.add(my_object)
>             Session.flush()
>     except IntegrityError:
>         my_object = Session.query(MyClass).filter(....).one()
>
>
>
>
>
> On Aug 27, 2013, at 11:13 AM, herzaso <her...@gmail.com <javascript:>> 
> wrote:
>
> Suppose we are looking at a race condition, do you also think this should 
> be handled by catching the IntegrityError?
> If so, what should I do? only flush and do the operation again?
>
> On Tuesday, August 27, 2013 5:42:23 PM UTC+3, Michael Bayer wrote:
>>
>> the word "occasional" is very meaningful.  It usually suggests race 
>> conditions.    Then with the word "tornado", the baysean filters are 
>> strongly leaning towards "race condition" at that point :).
>>
>> if an error is occurring only under volume then you have to revisit where 
>> race conditions can occur.
>>
>> On Aug 27, 2013, at 10:32 AM, herzaso <her...@gmail.com> wrote:
>>
>> I'm running a Tornado server without redundancy (only one process, 
>> requests can arrive at the same time but will be handled one at a time)
>> I do agree that for large volumes, catching the IntegrityError would be 
>> better, but currently I am handling a single request at a time and I want 
>> to fix this problem before I move on ...
>>
>>
>> On Tuesday, August 27, 2013 5:24:07 PM UTC+3, Simon King wrote:
>>>
>>> On Tue, Aug 27, 2013 at 2:31 PM, herzaso <her...@gmail.com> wrote: 
>>> > On Tuesday, August 27, 2013 3:55:50 PM UTC+3, Simon King wrote: 
>>> >> 
>>> >> On Tue, Aug 27, 2013 at 1:40 PM, herzaso <her...@gmail.com> wrote: 
>>> >> > I have a model with an ID column set as the primary key, though i'd 
>>> like 
>>> >> > to 
>>> >> > be able to identify records by 3 other columns. 
>>> >> > For this situation, I've added a classmethod that will fetch the 
>>> record 
>>> >> > if 
>>> >> > found or a new record if not. 
>>> >> > The problem i'm having is that every once in a while, I get 
>>> >> > IntegrityError 
>>> >> > trying to flush a change 
>>> >> > 
>>> >> > class Foo(Base): 
>>> >> >     __table_args__ = (sa.UniqueConstraint('bar', 'baz', 'qux'),) 
>>> >> > 
>>> >> >     id = sa.Column(Identifier, sa.Sequence('%s_id_seq' % 
>>> __tablename__), 
>>> >> > nullable=False, primary_key=True) 
>>> >> >     bar = sa.Column(sa.BigInteger) 
>>> >> >     baz = sa.Column(sa.BigInteger) 
>>> >> >     qux = sa.Column(sa.BigInteger) 
>>> >> >     a1 = sa.Column(sa.BigInteger) 
>>> >> >     a2 = sa.Column(sa.BigInteger) 
>>> >> > 
>>> >> >     @classmethod 
>>> >> >     def get(cls, bar=None, baz=None, qux=None, **kwargs): 
>>> >> >         item = session.query(cls).\ 
>>> >> >             filter(cls.bar== bar).\ 
>>> >> >             filter(cls.baz == baz).\ 
>>> >> >             filter(cls.qux == qux).\ 
>>> >> >             first() 
>>> >> > 
>>> >> >         if item: 
>>> >> >             for k, v in kwargs.iteritems(): 
>>> >> >                 if getattr(item, k) != v: 
>>> >> >                     setattr(item, k, v) 
>>> >> >         else: 
>>> >> >             item = cls(bar=bar, baz=baz, qux=qux, **kwargs) 
>>> >> > 
>>> >> >         return item 
>>> >> > 
>>> >> > This is the code I use to add/update records: 
>>> >> > 
>>> >> > foo = Foo.get(**item) 
>>> >> > session.merge(foo) 
>>> >> > 
>>> >> > I'm struggling with this problem for some time now, and would 
>>> appreciate 
>>> >> > any 
>>> >> > help ... 
>>> >> > 
>>> >> 
>>> >> I'm not sure of the exact problem, but there are a couple of things 
>>> >> that you could investigate. 
>>> >> 
>>> >> Firstly, session.merge returns a copy of the object, rather than 
>>> >> adding the object that you supplied into the session. See 
>>> >> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#merging for 
>>> >> details. 
>>> >> 
>>> >> Secondly, your "get" method sometimes returns objects that are 
>>> already 
>>> >> part of the session (if they were in the database), and sometimes 
>>> >> objects that are not in the session. It would probably be more 
>>> >> consistent to always return objects that are part of the session, by 
>>> >> putting "session.add(item)" in your "else" clause. This would get rid 
>>> >> of the need for session.merge(). (If you want to be able to use the 
>>> >> "get" with non-global sessions, pass the session as a parameter.) 
>>> >> 
>>> >> Finally, if your session isn't auto-flushing, it would be possible 
>>> for 
>>> >> you to call "get" twice with the same parameters and get 2 different 
>>> >> objects back. 
>>> >> 
>>> >> You may want to look at the UniqueObject recipe in the wiki: 
>>> >> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject 
>>> >> 
>>> > Hi Simon, 
>>> > Thanks for the fast reply. 
>>> > 
>>> > I tried adding session.add(item) and session.flush() in the else 
>>> clause in 
>>> > the past but that didn't solve my problem. 
>>> > I didn't however remove the merge, do you think that might be the 
>>> problem? 
>>> > 
>>> > Regarding the flush, this code is part of an API server where a 
>>> > scoped_session is committed after each change. I haven't changed the 
>>> > autoflush parameter, and as I understand the default value is True 
>>> making a 
>>> > flush before each commit or query. 
>>> > 
>>> > As for the UniqueObject recipe, thanks! Amazing that I never found it 
>>> > searching for a cure. As I see it basically does the same ... 
>>> > 
>>> > I never managed to reproduce this bug on my development environment. 
>>> It only 
>>> > happens in my production environment. 
>>> > Do you suppose adding a session.add and removing the merge will solve 
>>> this 
>>> > issue? 
>>> > 
>>> > Thanks, 
>>> > Ofir 
>>>
>>> It's difficult to say without knowing more about your system. For 
>>> example, does your production system get multiple concurrent API 
>>> requests, or are they serialised? If 2 requests can come in at 
>>> approximately the same time and are handled by 2 different threads (or 
>>> processes), then it is easy to imagine that the first handler will 
>>> check the database, find that an entry doesn't exist, and create it. 
>>> But before it flushes the change to the database (or even after it 
>>> flushes, but before it commits, depending on your transaction 
>>> isolation), the second handler will check for the same object, find it 
>>> missing, and so create it. 
>>>
>>> To track down problems like this, you could ensure that your 
>>> development environment has the same thread/process behaviour as the 
>>> production environment, then try submitting multiple concurrent 
>>> requests to it. If you add "time.sleep" statements somewhere between 
>>> the creation of the object and the commit of the transaction you will 
>>> probably find it easier to trigger. 
>>>
>>> To actually fix the problem, you could choose to only handle a single 
>>> request at a time (fine if you don't expect a high volume of 
>>> requests). If that's not acceptable, you could catch the 
>>> IntegrityError and then re-process the request. 
>>>
>>> Hope that helps, 
>>>
>>> Simon 
>>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to sqlalchemy+...@googlegroups.com.
>> To post to this group, send email to sqlal...@googlegroups.com.
>> Visit this group at http://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+...@googlegroups.com <javascript:>.
> To post to this group, send email to sqlal...@googlegroups.com<javascript:>
> .
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

Reply via email to