The application I'm working on operates over extremely large datasets, so 
I'm using the query windowing from here 
(https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery) 
to break it into manageable chunks. The query window is usually around 10k 
rows, after which it updates/deletes some rows and continues on. Simple 
breakdown is like this:

query = session.query(Item).filter(...several filters)
total_items = query.count() # used for logging
for row in windowed_query(query, Item.id, 10000):
    count += 1

    # process, determine whether to keep (and update) or delete (put in a 
list for batch-deletion)
    # one such example is:
    if row.group_name != regex.group_name:
        continue

    if count >= 10000:
        save(items) # items to be kept, issues updates
        deleted = 
db.query(Item).filter(Item.id.in_(dead_items)).delete(synchronize_session='fetch')
        session.commit()
        count = 0

This works fine until it's gone through a save/delete cycle. Once it's 
saved, it goes back to access the windowed query again and pull the next 
10k rows. This works until the following line:

if row.group_name != regex.group_name:

At which point sqla will emit a SELECT for the item of that specific ID, 
presumably because the group_name wasn't available and it had to fetch it. 
This only occurs after the commit - so I assume that committing the session 
is breaking the query. Hence, for the next 10k rows, it emits 10k queries 
(one per row).

Because the script is potentially processing so many rows, I don't want to 
let the dead_items list grow to be massive, so the deletes need to occur 
fairly regularly throughout the process.

Any idea what's causing this / how to fix it? Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to