Hi all,

I just discovered something weird when doing concurrency testing with
my program. Before writing a simplified test case for it and really
figuring out whether its a bug with sqlalchemy (I am using 0.5.5), I
wanted to write the scenario here. Basically I was getting the
infamous "Set size changed during iteration" error during commit.
Multiple threads are not accessing the same session (as was in some of
the posts I have read online). Here's what I am doing:

There is a task dispatcher that queries the database, gets a list of
tasks to be done, dispatches them to separate threads. Each thread
creates its own session using the session_maker and updates the status
of a task. There are three kinds of objects in question: Task,
TaskSchedule and TaskResult. There is a one-to-many relation between
Task and TaskSchedule. There is also a one-to-many relation between
Task and TaskResult. Pseudo code:

task_dispatcher:
   for task_schedule in sess.query(TaskSchedule).filter
(next_execution_time <= current_time):
       do_task(task_schedule.task) <-- using relation
       task_schedule.next_execution_time = some_value
       sess.commit()

(in a new thread)
do_task(task):
     sess := get_session()
     sess.add(TaskResult(task.id, task_started))
     sess.commit()
     task.perform()
     sess.merge(TaskResult(task.id, task_finished))
     sess.commit()
     sess.close()

Basically, I get the mysterious error for the commit in
task_dispatcher intermittently (probably 10 times out of 1000
executions). This brings me to my first question: Is it bad practice
to commit a session from within the sess.query() loop? I wanted each
task execution time update to be independent of other tasks. If this
is bad practice, then should I use sub-transactions for this purpose?

After realizing this possible pitfall, I changed the code to commit
only after the loop has finished. This lead to another interesting
result: None of my TaskResult objects got committed. Instead I saw a
lot of ROLLBACK's in the log with no errors associated with them.

I tried yet another variation. This is the one that puzzled me the
most. I tried eager loading the task relation in the TaskScheduleItem
and it worked like a charm:

task_dispatcher:
   for task_schedule in sess.query(TaskScheduleItem).options(eagerload
('task')).filter(next_execution_time <= current_time):
       do_task(task_schedule.task) <-- using relation
       task_schedule.next_execution_time = some_value
   sess.commit()

This time, no errors, the TaskResults appeared in database with each
commit in the do_task function.

Does anyone have an insight into this?

Thanks,
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to