Re: [sqlalchemy] Re: Working with large IN lists

Simon King Tue, 21 Feb 2012 07:35:49 -0800

On Tue, Feb 21, 2012 at 3:24 PM, Manav Goel <manav.goe...@gmail.com> wrote:
> This depends upon the execution plan of the query and is more really a
> postgresql question. Google postgresql IN performance and you will get
> a good idea of it.
> By the look of your code, Second option would obviously be faster as
> it hits database once whereas first one flush after every change.
>
> Regards,
>
> On Feb 21, 6:07 pm, "Vlad K." <v...@haronmedia.com> wrote:
>> Hi all,
>>
>> I have to read thousands of rows from a table and compile some data, but
>> in certain conditions update those rows, all with same value. The ratio
>> of reads and writes here is widest possible. Sometimes no rows,
>> sometimes few, and sometimes all rows that are read have to be updated.
>> The last case scenario is making me concerned.
>>
>> for row in query.yield_per(100):
>>      # Do something with data
>>
>>      if some_condition:
>>          row.some_column = 123
>>      session.flush()
>>
>> I am thinking about just adding the row's ID to a list:
>>
>> list_of_ids = []
>> for row in query.yield_per(100):
>>      # Do something with data
>>
>>      if some_condition:
>>          list_of_ids.append(row.primary_key)
>>
>> and near the end of transaction do:
>>
>> session.query(Model).filter(Model.primary_key.in_(list_of_ids)).update({"so 
>> me_column"
>> : 123}, False)
>>
>> Yes I'm aware of increased memory requirements to store the ID list on
>> the application side, and no I don't need to lock the rows for update,
>> the logic of atomic update at the end is sufficient for my case.
>>
>> But I think, and the real use benchmarks will probably show, I haven't
>> tested yet, that single update query will work faster. I need lowest
>> transaction processing time on the application side for entire call,
>> even if takes more memory and more database iron.
>>
>> What I'm concerned with here is if there are any limits or significant
>> overheads with "large" .in_ lists?
>>
>> The backend is PostgreSQL via psycopg2.
>>
>> Thanks
>>
>> --
>>
>> .oO V Oo.


A *long* time ago (SQLALchemy 0.3), I had some performance problems
with large "IN" clauses, and the reason turned out to be SQLAlchemy
taking a lot of time to build long lists of bindparam objects. I've no
idea if this is still the case these days. The best thing you can do
is just try it.

Simon

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

Reply via email to