Hi all,
I have to read thousands of rows from a table and compile some data, but
in certain conditions update those rows, all with same value. The ratio
of reads and writes here is widest possible. Sometimes no rows,
sometimes few, and sometimes all rows that are read have to be updated.
The last case scenario is making me concerned.
for row in query.yield_per(100):
# Do something with data
if some_condition:
row.some_column = 123
session.flush()
I am thinking about just adding the row's ID to a list:
list_of_ids = []
for row in query.yield_per(100):
# Do something with data
if some_condition:
list_of_ids.append(row.primary_key)
and near the end of transaction do:
session.query(Model).filter(Model.primary_key.in_(list_of_ids)).update({"some_column"
: 123}, False)
Yes I'm aware of increased memory requirements to store the ID list on
the application side, and no I don't need to lock the rows for update,
the logic of atomic update at the end is sufficient for my case.
But I think, and the real use benchmarks will probably show, I haven't
tested yet, that single update query will work faster. I need lowest
transaction processing time on the application side for entire call,
even if takes more memory and more database iron.
What I'm concerned with here is if there are any limits or significant
overheads with "large" .in_ lists?
The backend is PostgreSQL via psycopg2.
Thanks
--
.oO V Oo.
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.