[sqlalchemy] Working with large IN lists

Vlad K. Tue, 21 Feb 2012 05:08:04 -0800


Hi all,

I have to read thousands of rows from a table and compile some data, butin certain conditions update those rows, all with same value. The ratioof reads and writes here is widest possible. Sometimes no rows,sometimes few, and sometimes all rows that are read have to be updated.The last case scenario is making me concerned.



for row in query.yield_per(100):
    # Do something with data

    if some_condition:
        row.some_column = 123
    session.flush()


I am thinking about just adding the row's ID to a list:

list_of_ids = []
for row in query.yield_per(100):
    # Do something with data

    if some_condition:
        list_of_ids.append(row.primary_key)

and near the end of transaction do:

session.query(Model).filter(Model.primary_key.in_(list_of_ids)).update({"some_column": 123}, False)

Yes I'm aware of increased memory requirements to store the ID list onthe application side, and no I don't need to lock the rows for update,the logic of atomic update at the end is sufficient for my case.

But I think, and the real use benchmarks will probably show, I haven'ttested yet, that single update query will work faster. I need lowesttransaction processing time on the application side for entire call,even if takes more memory and more database iron.

What I'm concerned with here is if there are any limits or significantoverheads with "large" .in_ lists?


The backend is PostgreSQL via psycopg2.



Thanks

--

.oO V Oo.

--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

[sqlalchemy] Working with large IN lists

Reply via email to