.oO V Oo.
On 02/23/2012 02:50 PM, Michael Bayer wrote:
I'd note that those subqueryloads() render the effectiveness of yield_per() to
be almost nil.
I know. I've replicated the real use case in the application which has
yet to see proper optimization which includes better query planning and
reduction of unnecessary joins and relationships. I'd rather investigate
in implementing prepared statements and do basically lazy="select"
instead of subqueries.
The next thing I'd look at is that second query for ModelC. You can also load
those upfront so that you don't need to do a query each time:
modelcs = dict(
sess.query(ModelC.id_a, ModelC).join(ModelC.a).filter(ModelA.parent_id=<same
integer you're using against ModelA>)
)
then you have a dictionary of "id_a->ModelC" as you loop through your ModelA records. All the
ModelC's are already in the session, so you don't need to use merge(), which is also not a super quick
operation. If an "id_a" is not in the dictionary then you know to create a new ModelC and use
Session.add() to put it in. If you're limiting the ModelA rows using a LIMIT or window recipe like I mentioned
earlier, you'd apply that same criteria to the loading of the "modelcs".
Later on you're doing something with query(ModelC).first() in a loop which is
also something I hope isn't in the real application - looking there is seems
like you'd only need to say query(ModelC).delete().
Might not be visible from this test script, but the scenario is this.
ModelA represents certain data that has to be exported to external
services (XMLRPC, REST and similar). In an ideal situation I just select
all ModelA that has to be exported (by looking at timestamp of last
modification vs timestamp of process run), but I can't do that because
if such a transaction fails, it has to remain "remembered" for next
batch run. So I use ModelC table which logs these pending transactions.
So the first phase selects rows from ModelA that are up for export and
creates transaction logs in ModelC.
The second phase then loads and exports one by one row from ModelC
(joined with ModelA and everything else required for the export).
However, if single transaction fails, the entire script exist and
continues when called next time. This I have to do for other reasons
(preventing overload on possibly downed external service etc.., so I
can't "skip" that row and fetch next).
It may happen, and does regularly, that on subsequent runs of the
process there are no new ModelA rows to load, but there are ModelC rows
that failed from last call (since they're in the table, it means they
were not processed)
And that's the logic in query(ModelC).first(), processing, and then
delete(). Also note that each row has to be an individual transaction
(load, send to external service, remove from ModelC table), which means
I can't rely on session/identity "caching" by pre-loading data instead
of joins and subqueries.
Watching the SQL emitted with echo=True, and in some cases also seeing how
large the results coming in are using echo='debug', is something I strongly
recommend when first profiling an application.
Yes, I use logging and see all the SQL emitted.
Thanks for your input, I appreciate all the help and advice I can get.
Still a ton of stuff to learn about SQLA.
V
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/sqlalchemy?hl=en.