Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

Simon King Mon, 02 Sep 2013 05:30:45 -0700

What exactly do you mean by not using transactions? The Session always
works within a transaction:


  http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#managing-transactions

I assume you are also using InnoDB tables.

On Mon, Sep 2, 2013 at 1:19 PM, herzaso <herz...@gmail.com> wrote:
> I do have it set as REPEATABLE READ.
> However, I don't use transactions in sqlalchemy
>
>
> On Monday, September 2, 2013 3:08:58 PM UTC+3, Simon King wrote:
>>
>> Do you know what transaction isolation level you are running at? The
>> default apparently is "REPEATABLE READ":
>>
>>
>> http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html#isolevel_repeatable-read
>>
>> The important sentence in that link is:
>>
>>   All consistent reads within the same transaction read the snapshot
>> established by the first read
>>
>> When you query the database for the first time, to see if the entity
>> already exists, you are setting that initial snapshot. If you run the
>> same query again (such as in your exception handler), you will get the
>> same results, whether or not another connection has inserted a
>> matching row in the meantime.
>>
>> Simon
>>
>> On Mon, Sep 2, 2013 at 12:54 PM, herzaso <her...@gmail.com> wrote:
>> > I'm not sure what to make of the results:
>> > On the first connection, I ran BEGIN and INSERT and both were
>> > successful,
>> > but when I tried the INSERT statement on the second connection, I got
>> > "ERROR
>> > 1205 (HY000): Lock wait timeout exceeded; try restarting transaction".
>> > Running the same query on the first connection produced the required
>> > result
>> > which is "ERROR 1062 (23000): Duplicate entry"
>> > After the ROLLBACK on the first connection, the INSERT statement worked
>> > well
>> > on the second connection
>> >
>> > Regarding your second remark, the answer is yes, the error was due to
>> > the
>> > unique constraint on those columns
>> >
>> > BTW: I'm working on MySQL
>> >
>> > On Monday, September 2, 2013 1:31:12 PM UTC+3, Simon King wrote:
>> >>
>> >> I don't really know the answer, but I'd be interested in the results
>> >> of this experiment:
>> >>
>> >> Forget about SQLAlchemy for the moment, and start 2 plain SQL
>> >> connections to your database. In the first, type something like the
>> >> following:
>> >>
>> >> BEGIN;
>> >> INSERT foo(bar, baz, qux) VALUES(1, 1, 1);
>> >>
>> >> Now in the second connection do the same. I assume it'll fail because
>> >> of the duplicate values.
>> >>
>> >> Now in the first connection issue a "ROLLBACK". You should now be in a
>> >> state where no matching row exists in the database, even though you
>> >> received an error about constraint violations.
>> >>
>> >> The results you see may be different, depending on your transaction
>> >> isolation level. (It may be that you don't get the constraint
>> >> violation at all until you try to commit the second connection).
>> >>
>> >> Another thing you could look at: are you sure that the error you are
>> >> getting is due to the unique constraint on bar/baz/qux, and not some
>> >> other constraint in the database?
>> >>
>> >> Simon
>> >>
>> >> On Mon, Sep 2, 2013 at 8:45 AM, herzaso <her...@gmail.com> wrote:
>> >> > I'm afraid it didn't solve my problem.
>> >> >
>> >> > Here is my updated method:
>> >> >     @classmethod
>> >> >     def get(cls, bar=None, baz=None, qux=None, **kwargs):
>> >> >         query = session.query(cls).\
>> >> >             filter(cls.bar == bar).\
>> >> >             filter(cls.baz == baz).\
>> >> >             filter(cls.qux == qux)
>> >> >
>> >> >         item = query.first()
>> >> >         updated = False
>> >> >
>> >> >         if not item:
>> >> >             try:
>> >> >                 with session.begin_nested():   # run inside a
>> >> > SAVEPOINT
>> >> >                     updated = True
>> >> >                     item = cls(bar=bar, baz=baz, qux=qux, **kwargs)
>> >> >                     session.add(item)
>> >> >                     session.flush()
>> >> >             except sa.exc.IntegrityError:
>> >> >                 item = query.first()
>> >> >                 if not item:
>> >> >                     raise Exception("invalidIntegrityError")
>> >> >             except:
>> >> >                 raise
>> >> >
>> >> >         if not updated:
>> >> >             for k, v in kwargs.iteritems():
>> >> >                 if getattr(item, k) != v:
>> >> >                     setattr(item, k, v)
>> >> >
>> >> >         return item
>> >> >
>> >> > With this code, i'm getting invalidIntegrityError. How is it
>> >> > possible?
>> >> > (it's also worth pointing out that this solution requires SA 0.8.2
>> >> > (otherwise, there is a problem with session.begin_nested)
>> >> >
>> >> >
>> >> > On Tuesday, August 27, 2013 6:40:03 PM UTC+3, Michael Bayer wrote:
>> >> >>
>> >> >> I'm not a fan of catching integrity errors, i prefer to try to make
>> >> >> sure
>> >> >> they aren't going to happen, or if they are, they aren't a normal
>> >> >> occurrence
>> >> >> and the system is such that the particular operation can just fail
>> >> >> (of
>> >> >> course it depends on what it is).     A problem with catching the
>> >> >> integrity
>> >> >> error due to concurrent, conflicting operations is that depending on
>> >> >> backend
>> >> >> and isolation level, you can't be totally sure when the error is
>> >> >> going
>> >> >> to
>> >> >> get raised (e.g. serializable isolation vs. non).  Also on a backend
>> >> >> like
>> >> >> Postgresql, the database can't recover the transaction after an
>> >> >> integrity
>> >> >> error unless you used a savepoint.
>> >> >>
>> >> >> But here you're doing the "concurrent transactions need row identity
>> >> >> X",
>> >> >> so maybe it is appropriate here.  Here is a rough idea of a
>> >> >> transactional
>> >> >> pattern for that, noting this isn't tested:
>> >> >>
>> >> >> try:
>> >> >>     my_object = Session.query(MyClass).filter(....).one()
>> >> >> except NoResultFound:
>> >> >>     try:
>> >> >>         with Session.begin_nested():   # run inside a SAVEPOINT
>> >> >>             my_object = MyClass(...)
>> >> >>             Session.add(my_object)
>> >> >>             Session.flush()
>> >> >>     except IntegrityError:
>> >> >>         my_object = Session.query(MyClass).filter(....).one()
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Aug 27, 2013, at 11:13 AM, herzaso <her...@gmail.com> wrote:
>> >> >>
>> >> >> Suppose we are looking at a race condition, do you also think this
>> >> >> should
>> >> >> be handled by catching the IntegrityError?
>> >> >> If so, what should I do? only flush and do the operation again?
>> >> >>
>> >> >> On Tuesday, August 27, 2013 5:42:23 PM UTC+3, Michael Bayer wrote:
>> >> >>>
>> >> >>> the word "occasional" is very meaningful.  It usually suggests race
>> >> >>> conditions.    Then with the word "tornado", the baysean filters
>> >> >>> are
>> >> >>> strongly leaning towards "race condition" at that point :).
>> >> >>>
>> >> >>> if an error is occurring only under volume then you have to revisit
>> >> >>> where
>> >> >>> race conditions can occur.
>> >> >>>
>> >> >>> On Aug 27, 2013, at 10:32 AM, herzaso <her...@gmail.com> wrote:
>> >> >>>
>> >> >>> I'm running a Tornado server without redundancy (only one process,
>> >> >>> requests can arrive at the same time but will be handled one at a
>> >> >>> time)
>> >> >>> I do agree that for large volumes, catching the IntegrityError
>> >> >>> would
>> >> >>> be
>> >> >>> better, but currently I am handling a single request at a time and
>> >> >>> I
>> >> >>> want to
>> >> >>> fix this problem before I move on ...
>> >> >>>
>> >> >>>
>> >> >>> On Tuesday, August 27, 2013 5:24:07 PM UTC+3, Simon King wrote:
>> >> >>>>
>> >> >>>> On Tue, Aug 27, 2013 at 2:31 PM, herzaso <her...@gmail.com> wrote:
>> >> >>>> > On Tuesday, August 27, 2013 3:55:50 PM UTC+3, Simon King wrote:
>> >> >>>> >>
>> >> >>>> >> On Tue, Aug 27, 2013 at 1:40 PM, herzaso <her...@gmail.com>
>> >> >>>> >> wrote:
>> >> >>>> >> > I have a model with an ID column set as the primary key,
>> >> >>>> >> > though
>> >> >>>> >> > i'd
>> >> >>>> >> > like
>> >> >>>> >> > to
>> >> >>>> >> > be able to identify records by 3 other columns.
>> >> >>>> >> > For this situation, I've added a classmethod that will fetch
>> >> >>>> >> > the
>> >> >>>> >> > record
>> >> >>>> >> > if
>> >> >>>> >> > found or a new record if not.
>> >> >>>> >> > The problem i'm having is that every once in a while, I get
>> >> >>>> >> > IntegrityError
>> >> >>>> >> > trying to flush a change
>> >> >>>> >> >
>> >> >>>> >> > class Foo(Base):
>> >> >>>> >> >     __table_args__ = (sa.UniqueConstraint('bar', 'baz',
>> >> >>>> >> > 'qux'),)
>> >> >>>> >> >
>> >> >>>> >> >     id = sa.Column(Identifier, sa.Sequence('%s_id_seq' %
>> >> >>>> >> > __tablename__),
>> >> >>>> >> > nullable=False, primary_key=True)
>> >> >>>> >> >     bar = sa.Column(sa.BigInteger)
>> >> >>>> >> >     baz = sa.Column(sa.BigInteger)
>> >> >>>> >> >     qux = sa.Column(sa.BigInteger)
>> >> >>>> >> >     a1 = sa.Column(sa.BigInteger)
>> >> >>>> >> >     a2 = sa.Column(sa.BigInteger)
>> >> >>>> >> >
>> >> >>>> >> >     @classmethod
>> >> >>>> >> >     def get(cls, bar=None, baz=None, qux=None, **kwargs):
>> >> >>>> >> >         item = session.query(cls).\
>> >> >>>> >> >             filter(cls.bar== bar).\
>> >> >>>> >> >             filter(cls.baz == baz).\
>> >> >>>> >> >             filter(cls.qux == qux).\
>> >> >>>> >> >             first()
>> >> >>>> >> >
>> >> >>>> >> >         if item:
>> >> >>>> >> >             for k, v in kwargs.iteritems():
>> >> >>>> >> >                 if getattr(item, k) != v:
>> >> >>>> >> >                     setattr(item, k, v)
>> >> >>>> >> >         else:
>> >> >>>> >> >             item = cls(bar=bar, baz=baz, qux=qux, **kwargs)
>> >> >>>> >> >
>> >> >>>> >> >         return item
>> >> >>>> >> >
>> >> >>>> >> > This is the code I use to add/update records:
>> >> >>>> >> >
>> >> >>>> >> > foo = Foo.get(**item)
>> >> >>>> >> > session.merge(foo)
>> >> >>>> >> >
>> >> >>>> >> > I'm struggling with this problem for some time now, and would
>> >> >>>> >> > appreciate
>> >> >>>> >> > any
>> >> >>>> >> > help ...
>> >> >>>> >> >
>> >> >>>> >>
>> >> >>>> >> I'm not sure of the exact problem, but there are a couple of
>> >> >>>> >> things
>> >> >>>> >> that you could investigate.
>> >> >>>> >>
>> >> >>>> >> Firstly, session.merge returns a copy of the object, rather
>> >> >>>> >> than
>> >> >>>> >> adding the object that you supplied into the session. See
>> >> >>>> >> http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#merging
>> >> >>>> >> for
>> >> >>>> >> details.
>> >> >>>> >>
>> >> >>>> >> Secondly, your "get" method sometimes returns objects that are
>> >> >>>> >> already
>> >> >>>> >> part of the session (if they were in the database), and
>> >> >>>> >> sometimes
>> >> >>>> >> objects that are not in the session. It would probably be more
>> >> >>>> >> consistent to always return objects that are part of the
>> >> >>>> >> session,
>> >> >>>> >> by
>> >> >>>> >> putting "session.add(item)" in your "else" clause. This would
>> >> >>>> >> get
>> >> >>>> >> rid
>> >> >>>> >> of the need for session.merge(). (If you want to be able to use
>> >> >>>> >> the
>> >> >>>> >> "get" with non-global sessions, pass the session as a
>> >> >>>> >> parameter.)
>> >> >>>> >>
>> >> >>>> >> Finally, if your session isn't auto-flushing, it would be
>> >> >>>> >> possible
>> >> >>>> >> for
>> >> >>>> >> you to call "get" twice with the same parameters and get 2
>> >> >>>> >> different
>> >> >>>> >> objects back.
>> >> >>>> >>
>> >> >>>> >> You may want to look at the UniqueObject recipe in the wiki:
>> >> >>>> >> http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject
>> >> >>>> >>
>> >> >>>> > Hi Simon,
>> >> >>>> > Thanks for the fast reply.
>> >> >>>> >
>> >> >>>> > I tried adding session.add(item) and session.flush() in the else
>> >> >>>> > clause in
>> >> >>>> > the past but that didn't solve my problem.
>> >> >>>> > I didn't however remove the merge, do you think that might be
>> >> >>>> > the
>> >> >>>> > problem?
>> >> >>>> >
>> >> >>>> > Regarding the flush, this code is part of an API server where a
>> >> >>>> > scoped_session is committed after each change. I haven't changed
>> >> >>>> > the
>> >> >>>> > autoflush parameter, and as I understand the default value is
>> >> >>>> > True
>> >> >>>> > making a
>> >> >>>> > flush before each commit or query.
>> >> >>>> >
>> >> >>>> > As for the UniqueObject recipe, thanks! Amazing that I never
>> >> >>>> > found
>> >> >>>> > it
>> >> >>>> > searching for a cure. As I see it basically does the same ...
>> >> >>>> >
>> >> >>>> > I never managed to reproduce this bug on my development
>> >> >>>> > environment.
>> >> >>>> > It only
>> >> >>>> > happens in my production environment.
>> >> >>>> > Do you suppose adding a session.add and removing the merge will
>> >> >>>> > solve
>> >> >>>> > this
>> >> >>>> > issue?
>> >> >>>> >
>> >> >>>> > Thanks,
>> >> >>>> > Ofir
>> >> >>>>
>> >> >>>> It's difficult to say without knowing more about your system. For
>> >> >>>> example, does your production system get multiple concurrent API
>> >> >>>> requests, or are they serialised? If 2 requests can come in at
>> >> >>>> approximately the same time and are handled by 2 different threads
>> >> >>>> (or
>> >> >>>> processes), then it is easy to imagine that the first handler will
>> >> >>>> check the database, find that an entry doesn't exist, and create
>> >> >>>> it.
>> >> >>>> But before it flushes the change to the database (or even after it
>> >> >>>> flushes, but before it commits, depending on your transaction
>> >> >>>> isolation), the second handler will check for the same object,
>> >> >>>> find
>> >> >>>> it
>> >> >>>> missing, and so create it.
>> >> >>>>
>> >> >>>> To track down problems like this, you could ensure that your
>> >> >>>> development environment has the same thread/process behaviour as
>> >> >>>> the
>> >> >>>> production environment, then try submitting multiple concurrent
>> >> >>>> requests to it. If you add "time.sleep" statements somewhere
>> >> >>>> between
>> >> >>>> the creation of the object and the commit of the transaction you
>> >> >>>> will
>> >> >>>> probably find it easier to trigger.
>> >> >>>>
>> >> >>>> To actually fix the problem, you could choose to only handle a
>> >> >>>> single
>> >> >>>> request at a time (fine if you don't expect a high volume of
>> >> >>>> requests). If that's not acceptable, you could catch the
>> >> >>>> IntegrityError and then re-process the request.
>> >> >>>>
>> >> >>>> Hope that helps,
>> >> >>>>
>> >> >>>> Simon
>> >> >>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [sqlalchemy] Occasional IntegrityError when identifying model not by its ID

Reply via email to