Re: [sqlalchemy] Re: Working with large IN lists

Vlad K. Thu, 23 Feb 2012 06:11:02 -0800


.oO V Oo.



On 02/23/2012 02:50 PM, Michael Bayer wrote:

I'd note that those subqueryloads() render the effectiveness of yield_per() to 
be almost nil.

I know. I've replicated the real use case in the application which hasyet to see proper optimization which includes better query planning andreduction of unnecessary joins and relationships. I'd rather investigatein implementing prepared statements and do basically lazy="select"instead of subqueries.

The next thing I'd look at is that second query for ModelC.   You can also load 
those upfront so that you don't need to do a query each time:

modelcs = dict(
     sess.query(ModelC.id_a, ModelC).join(ModelC.a).filter(ModelA.parent_id=<same 
integer you're using against ModelA>)
)

then you have a dictionary of "id_a->ModelC" as you loop through your ModelA records.   All the 
ModelC's are already in the session, so you don't need to use merge(), which is also not a super quick 
operation.  If an "id_a" is not in the dictionary then you know to create a new ModelC and use 
Session.add() to put it in.   If you're limiting the ModelA rows using a LIMIT or window recipe like I mentioned 
earlier, you'd apply that same criteria to the loading of the "modelcs".

Later on you're doing something with query(ModelC).first() in a loop which is 
also something I hope isn't in the real application - looking there is seems 
like you'd only need to say query(ModelC).delete().

Might not be visible from this test script, but the scenario is this.ModelA represents certain data that has to be exported to externalservices (XMLRPC, REST and similar). In an ideal situation I just selectall ModelA that has to be exported (by looking at timestamp of lastmodification vs timestamp of process run), but I can't do that becauseif such a transaction fails, it has to remain "remembered" for nextbatch run. So I use ModelC table which logs these pending transactions.

So the first phase selects rows from ModelA that are up for export andcreates transaction logs in ModelC.

The second phase then loads and exports one by one row from ModelC(joined with ModelA and everything else required for the export).However, if single transaction fails, the entire script exist andcontinues when called next time. This I have to do for other reasons(preventing overload on possibly downed external service etc.., so Ican't "skip" that row and fetch next).

It may happen, and does regularly, that on subsequent runs of theprocess there are no new ModelA rows to load, but there are ModelC rowsthat failed from last call (since they're in the table, it means theywere not processed)

And that's the logic in query(ModelC).first(), processing, and thendelete(). Also note that each row has to be an individual transaction(load, send to external service, remove from ModelC table), which meansI can't rely on session/identity "caching" by pre-loading data insteadof joins and subqueries.

Watching the SQL emitted with echo=True, and in some cases also seeing how 
large the results coming in are using echo='debug', is something I strongly 
recommend when first profiling an application.


Yes, I use logging and see all the SQL emitted.

Thanks for your input, I appreciate all the help and advice I can get.Still a ton of stuff to learn about SQLA.




V

--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

Reply via email to