gizli wrote:
>
> Hi guys,
>
> I have a couple of questions about a recent concurrency test I did on
> SA. Someone in our team ran into a very intermittent issue where a
> certain operation he tried doing failed by ConcurrentUpdateError. I
> wrote a simple thread that can do query/update/insert/commit/rollback
> operations on demand and created two of those to create specific race
> conditions manually.. I found out some interesting stuff:
>
> If I use sqlite, two threads cannot operate on the same ORM object.
> E.g.:
>
> t1.update(obj)
> t2.update(obj)
> t1.commit()
> t2.commit()
>
> t2.update(obj) would fail with an error saying "Database is locked".

sqlite has a coarse grained locking scheme.   You can obtain better
control over it using the tips at
http://www.sqlalchemy.org/trac/wiki/DatabaseNotes#SQLite

>
> If I use mysql to do the same thing it goes through with no problems.

mysql has row locking (if youre using InnoDB tables) so contention occurs 
at a much finer grained level.

> The only time I could reproduce the concurrent update error is like
> this:
>
> t1.update(obj)
> t2.delete(obj)
> t2.commit()
> t1.commit()
>
> The last line will throw the error saying num updated rows is 0 where
> it was expecting 1. My first question is: Can this concurrent update
> error arise in any other situation?

SQLA's checks for this condition only occur with regards to rows which are
expected to exist that no longer exist.  The practice is a very minimal
form of what is known as "optimistic locking", which you can read about
here:  http://en.wikipedia.org/wiki/Optimistic_concurrency_control

SQLA offers a more robust form of "optimistic locking" if you employ the
usage of a version ID column (not too different from the timestamp
approach described on that wikipedia page).


> I also tried the following:
>
> t1.delete(obj)
> t2.delete(obj)
> t2.commit()
> t1.commit()
>
> This also fails with the same error saying the num deleted rows is 0
> when it was expecting 1. My second question is: Should t1's
> transaction go through? I would think so.

why is that ?   what happens if you go to your command line and try to
"rm" a file that doesn't exist ?  should "rm" silently fail ?

> I understand why an update
> to a deleted row fails the entire transaction but I do not understand
> the motivation for failing an entire transaction because a row I
> wanted to delete is already deleted.

because your program is assuming that a certain row exists which does not.
 operating on stale data is never a good idea - any number of changes may
have occurred between the time you have loaded the row and the time you
have deleted the row.

> This behavior is, as far as I can see, embedded in SA logic. So no
> matter which database I use, I will see this right?

you won't when using DBAPIs that do not offer a reliable count of rows
updated or deleted.  If you'd like to force the entire thing off, do this:

engine = create_engine(<url>)
engine.dialect.supports_sane_rowcount = False
engine.dialect.supports_sane_multi_rowcount = False

> The final question is a design question. In a typical web application
> that can have 100's of simultaneous users submitting a handful of
> tasks, how do we avoid this kind of concurrency issues?
> Each request
> would be handled with a separate thread meaning if any of these
> threads try modifying the same db object, we have a potential problem.
> Do we need exclusive locks for this kind of situation? Does SA allow
> us to do that?

most "social media" types of web applications involve each user issuing
UPDATE/DELETE statements to distinct groups of rows which are "owned" by
each user.  I.e. you're only updating your own comments, your own
messages, your own address books, etc.   So an optimistic system works
very well for that.

If OTOH your system is more like a CMS or document management system where
many users may be affecting the same row, you have to put more thought
into it.

Trac uses a locking scheme kind of like the one in the wikipedia page,
using timestamps that indicate "user X has a 'lock' on row A".  For
example if you try to update a ticket in Trac, and someone else has
changed the ticket since you last loaded it, a timestamp check fails and
you get a message stating that "the ticket has been changed since you last
loaded it".

You can also use a "pessimistic" system of locking rows.  The "standard"
pessimistic system may involve rows are literally locked at the DB level
using "SELECT..FOR UPDATE", as well as high levels of transaction
isolation (in Postgresql this is SERIAL isolation...I'm not as familiar
with how MySQL's works).  However,  you still have to accommodate
contention situations explicitly - user 1 displays record A, user 2
displays record A, user 1 issues a POST to delete it, user 2 issues a POST
to delete it as well, user 2's request goes to re-load the row to check
for deletion and its no longer there, so the app would have to "smooth
over" that situation (saying "already deleted" for example).

Another approach which is the most heavyhanded of all uses explicit "lock"
controls on the application, so that any user has to "lock" a document
before modifying, and everyone can see on their screens "this document
locked by user X".   A "lock" record is set within the database marking
the row as under the management of a specific user (again there's often a
timeout so that it expires).  This is the kind of thing you see if you
work in a law firm or newsroom similar with big document management
systems.

there's a lot of possibilities and you have to tailor one that is specific
to the kind of data and userbase your app deals with.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to