Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

herzaso Tue, 27 Aug 2013 08:18:28 -0700

Suppose we are looking at a race condition, do you also think this should 
be handled by catching the IntegrityError?
If so, what should I do? only flush and do the operation again?


On Tuesday, August 27, 2013 5:42:23 PM UTC+3, Michael Bayer wrote:
>
> the word "occasional" is very meaningful.  It usually suggests race 
> conditions.    Then with the word "tornado", the baysean filters are 
> strongly leaning towards "race condition" at that point :).
>
> if an error is occurring only under volume then you have to revisit where 
> race conditions can occur.
>
> On Aug 27, 2013, at 10:32 AM, herzaso <her...@gmail.com <javascript:>> 
> wrote:
>
> I'm running a Tornado server without redundancy (only one process, 
> requests can arrive at the same time but will be handled one at a time)
> I do agree that for large volumes, catching the IntegrityError would be 
> better, but currently I am handling a single request at a time and I want 
> to fix this problem before I move on ...
>
>
> On Tuesday, August 27, 2013 5:24:07 PM UTC+3, Simon King wrote:
>>
>> On Tue, Aug 27, 2013 at 2:31 PM, herzaso <her...@gmail.com> wrote: 
>> > On Tuesday, August 27, 2013 3:55:50 PM UTC+3, Simon King wrote: 
>> >> 
>> >> On Tue, Aug 27, 2013 at 1:40 PM, herzaso <her...@gmail.com> wrote: 
>> >> > I have a model with an ID column set as the primary key, though i'd 
>> like 
>> >> > to 
>> >> > be able to identify records by 3 other columns. 
>> >> > For this situation, I've added a classmethod that will fetch the 
>> record 
>> >> > if 
>> >> > found or a new record if not. 
>> >> > The problem i'm having is that every once in a while, I get 
>> >> > IntegrityError 
>> >> > trying to flush a change 
>> >> > 
>> >> > class Foo(Base): 
>> >> >     __table_args__ = (sa.UniqueConstraint('bar', 'baz', 'qux'),) 
>> >> > 
>> >> >     id = sa.Column(Identifier, sa.Sequence('%s_id_seq' % 
>> __tablename__), 
>> >> > nullable=False, primary_key=True) 
>> >> >     bar = sa.Column(sa.BigInteger) 
>> >> >     baz = sa.Column(sa.BigInteger) 
>> >> >     qux = sa.Column(sa.BigInteger) 
>> >> >     a1 = sa.Column(sa.BigInteger) 
>> >> >     a2 = sa.Column(sa.BigInteger) 
>> >> > 
>> >> >     @classmethod 
>> >> >     def get(cls, bar=None, baz=None, qux=None, **kwargs): 
>> >> >         item = session.query(cls).\ 
>> >> >             filter(cls.bar== bar).\ 
>> >> >             filter(cls.baz == baz).\ 
>> >> >             filter(cls.qux == qux).\ 
>> >> >             first() 
>> >> > 
>> >> >         if item: 
>> >> >             for k, v in kwargs.iteritems(): 
>> >> >                 if getattr(item, k) != v: 
>> >> >                     setattr(item, k, v) 
>> >> >         else: 
>> >> >             item = cls(bar=bar, baz=baz, qux=qux, **kwargs) 
>> >> > 
>> >> >         return item 
>> >> > 
>> >> > This is the code I use to add/update records: 
>> >> > 
>> >> > foo = Foo.get(**item) 
>> >> > session.merge(foo) 
>> >> > 
>> >> > I'm struggling with this problem for some time now, and would 
>> appreciate 
>> >> > any 
>> >> > help ... 
>> >> > 
>> >> 
>> >> I'm not sure of the exact problem, but there are a couple of things 
>> >> that you could investigate. 
>> >> 
>> >> Firstly, session.merge returns a copy of the object, rather than 
>> >> adding the object that you supplied into the session. See 
>> >> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#merging for 
>> >> details. 
>> >> 
>> >> Secondly, your "get" method sometimes returns objects that are already 
>> >> part of the session (if they were in the database), and sometimes 
>> >> objects that are not in the session. It would probably be more 
>> >> consistent to always return objects that are part of the session, by 
>> >> putting "session.add(item)" in your "else" clause. This would get rid 
>> >> of the need for session.merge(). (If you want to be able to use the 
>> >> "get" with non-global sessions, pass the session as a parameter.) 
>> >> 
>> >> Finally, if your session isn't auto-flushing, it would be possible for 
>> >> you to call "get" twice with the same parameters and get 2 different 
>> >> objects back. 
>> >> 
>> >> You may want to look at the UniqueObject recipe in the wiki: 
>> >> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject 
>> >> 
>> > Hi Simon, 
>> > Thanks for the fast reply. 
>> > 
>> > I tried adding session.add(item) and session.flush() in the else clause 
>> in 
>> > the past but that didn't solve my problem. 
>> > I didn't however remove the merge, do you think that might be the 
>> problem? 
>> > 
>> > Regarding the flush, this code is part of an API server where a 
>> > scoped_session is committed after each change. I haven't changed the 
>> > autoflush parameter, and as I understand the default value is True 
>> making a 
>> > flush before each commit or query. 
>> > 
>> > As for the UniqueObject recipe, thanks! Amazing that I never found it 
>> > searching for a cure. As I see it basically does the same ... 
>> > 
>> > I never managed to reproduce this bug on my development environment. It 
>> only 
>> > happens in my production environment. 
>> > Do you suppose adding a session.add and removing the merge will solve 
>> this 
>> > issue? 
>> > 
>> > Thanks, 
>> > Ofir 
>>
>> It's difficult to say without knowing more about your system. For 
>> example, does your production system get multiple concurrent API 
>> requests, or are they serialised? If 2 requests can come in at 
>> approximately the same time and are handled by 2 different threads (or 
>> processes), then it is easy to imagine that the first handler will 
>> check the database, find that an entry doesn't exist, and create it. 
>> But before it flushes the change to the database (or even after it 
>> flushes, but before it commits, depending on your transaction 
>> isolation), the second handler will check for the same object, find it 
>> missing, and so create it. 
>>
>> To track down problems like this, you could ensure that your 
>> development environment has the same thread/process behaviour as the 
>> production environment, then try submitting multiple concurrent 
>> requests to it. If you add "time.sleep" statements somewhere between 
>> the creation of the object and the commit of the transaction you will 
>> probably find it easier to trigger. 
>>
>> To actually fix the problem, you could choose to only handle a single 
>> request at a time (fine if you don't expect a high volume of 
>> requests). If that's not acceptable, you could catch the 
>> IntegrityError and then re-process the request. 
>>
>> Hope that helps, 
>>
>> Simon 
>>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to sqlalchemy+...@googlegroups.com <javascript:>.
> To post to this group, send email to sqlal...@googlegroups.com<javascript:>
> .
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

Reply via email to