.oO V Oo.


On 02/23/2012 02:50 PM, Michael Bayer wrote:
I'd note that those subqueryloads() render the effectiveness of yield_per() to 
be almost nil.

I know. I've replicated the real use case in the application which has yet to see proper optimization which includes better query planning and reduction of unnecessary joins and relationships. I'd rather investigate in implementing prepared statements and do basically lazy="select" instead of subqueries.


The next thing I'd look at is that second query for ModelC.   You can also load 
those upfront so that you don't need to do a query each time:

modelcs = dict(
     sess.query(ModelC.id_a, ModelC).join(ModelC.a).filter(ModelA.parent_id=<same 
integer you're using against ModelA>)
)

then you have a dictionary of "id_a->ModelC" as you loop through your ModelA records.   All the 
ModelC's are already in the session, so you don't need to use merge(), which is also not a super quick 
operation.  If an "id_a" is not in the dictionary then you know to create a new ModelC and use 
Session.add() to put it in.   If you're limiting the ModelA rows using a LIMIT or window recipe like I mentioned 
earlier, you'd apply that same criteria to the loading of the "modelcs".

Later on you're doing something with query(ModelC).first() in a loop which is 
also something I hope isn't in the real application - looking there is seems 
like you'd only need to say query(ModelC).delete().

Might not be visible from this test script, but the scenario is this. ModelA represents certain data that has to be exported to external services (XMLRPC, REST and similar). In an ideal situation I just select all ModelA that has to be exported (by looking at timestamp of last modification vs timestamp of process run), but I can't do that because if such a transaction fails, it has to remain "remembered" for next batch run. So I use ModelC table which logs these pending transactions.

So the first phase selects rows from ModelA that are up for export and creates transaction logs in ModelC.

The second phase then loads and exports one by one row from ModelC (joined with ModelA and everything else required for the export). However, if single transaction fails, the entire script exist and continues when called next time. This I have to do for other reasons (preventing overload on possibly downed external service etc.., so I can't "skip" that row and fetch next).

It may happen, and does regularly, that on subsequent runs of the process there are no new ModelA rows to load, but there are ModelC rows that failed from last call (since they're in the table, it means they were not processed)

And that's the logic in query(ModelC).first(), processing, and then delete(). Also note that each row has to be an individual transaction (load, send to external service, remove from ModelC table), which means I can't rely on session/identity "caching" by pre-loading data instead of joins and subqueries.


Watching the SQL emitted with echo=True, and in some cases also seeing how 
large the results coming in are using echo='debug', is something I strongly 
recommend when first profiling an application.

Yes, I use logging and see all the SQL emitted.



Thanks for your input, I appreciate all the help and advice I can get. Still a ton of stuff to learn about SQLA.



V

--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to