Hi All.

I have a select query that uses "subqueryload" and looks like this:

        completed_imports = self.ra_import_file.visible() \
            .filter(ImportFile.lock_date == None) \
            .filter(ImportFile.process_date != None) \
            .order_by(ImportFile.process_date.desc()) \
            .options(subqueryload('user')) \
            .all()

However, intermittently I am finding results from the query where
process_date is None (NULL) in production, which seems weird, since I
am using  .filter(ImportFile.process_date != None).

This table is quite busy, and there can be quite a few rows that turn
into "complete imports" by changing their process_date from NULL to an
actual date.

Because I am using subqueryload(), two SQL statements are issued:

First query:

  SELECT import_file.create_date ...
  FROM import_file
  WHERE import_file.import_file_id IN (SELECT
import_file.import_file_id AS import_file_import_file_id
  FROM import_file
  WHERE import_file.lock_date IS NULL AND import_file.process_date IS
NOT NULL ORDER BY import_file.process_date DESC

Second query:

  SELECT "user".password AS user_password... ,
anon_1.import_file_user_id AS anon_1_import_file_user_id
  FROM (SELECT import_file.user_id AS import_file_user_id
  FROM import_file
  WHERE import_file.import_file_id IN (SELECT
import_file.import_file_id AS import_file_import_file_id
  FROM import_file
  WHERE import_file.lock_date IS NULL AND import_file.process_date IS
NOT NULL) AS anon_1 JOIN "user"
  ON anon_1.import_file_user_id = "user".user_id ORDER BY
anon_1.import_file_user_id

My [unconfirmed] theory why I occasionally get rows returned that have
NULL process_dates is that race conditions like this happen when under
high load:

1) Row 'foo' in the file_import table has process_date = NULL
2) First SQL query runs (SELECT import_file.create_date ...) and the
'foo' row is not returned in the result set because it does not match
the "AND import_file.process_date IS NOT NULL" where clause.
3) External process completes import and sets process_date =
<current_date> for row 'foo'
4) Second SQL query runs (SELECT "user".password AS user_password...)
and the 'foo' row is now in the result set because it matches the "AND
import_file.process_date IS NOT NULL" where clause.
5) "subqueryload" joins both SQL queries together in results in python
(sorry, I got a bit lost trying to trace this in sqlalchemy code),
resulting in row 'foo' being in the results, but missing values such
as 'process_date', which are only queried in the first SQL query.

Am I completely wrong? Does subqueryload() only return results if each
row was returned in both SQL queries?

Thanks.

- Paul



-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to