[sqlalchemy] Using the same joinedload object in multiple options causes performance issue with baked queries

Tom Flannaghan Wed, 06 Jun 2018 03:54:23 -0700

Hi all,

We have just upgraded to sqlalchemy 1.2.7 (from 1.1.14), and had a 
performance issue with a query that uses a lot of joinedloads that was 
caused by the automatic baking of all relationship queries that was 
introduced in 1.2.


Say we have a set of tables with relationships Book.pages, Page.font and 
Font.layout. We have a query of this form:

pages = joinedload(Book.pages)
option1 = pages.joinedload(Page.font)
option2 = pages.joinedload(Page.layout)

query = session().query(Book).options(option1, option2)

The important point here is that the pages object defined on line 1 is 
reused in both option1 and option2. Now suppose we fetch another 
relationship that wasn't joined-loaded on the returned instances. This will 
case another query as it is not loaded already, and this query will be 
baked due to the change in 1.2 to bake all relationship loads.

We found that the construction of the cache key for baking this query 
becomes very slow as the number of options of this form increases, and is 
in fact quadratic in the number of such options (we have ~25 such options 
in our problematic query). This is due to each option containing all of the 
joinedloads inside its _to_bind attribute, and 
_UnboundLoad._generate_cache_key has to process everything in the _to_bind 
list. E.g. in this example:

print([[str(i) for i in load.path] for load in option1._to_bind])
print([[str(i) for i in load.path] for load in option2._to_bind])

[['Book.pages'], ['Book.pages', 'Page.font'], ['Book.pages', 'Page.layout']]

[['Book.pages'], ['Book.pages', 'Page.font'], ['Book.pages', 'Page.layout']]

Therefore, when generating the key for each option we are processing the 
joinedloads from all of the options, leading to the quadratic performance 
degradation.

We fixed it by avoiding reusing the joinedload for Book.pages by doing this:

option1 = joinedload(Book.pages).joinedload(Page.font)
option2 = joinedload(Book.pages).joinedload(Page.layout)

The resulting query is unchanged, but the cache key function is now just 
linear in the number of joinedloads as each option has only its 
relationships in its _to_bind attribute. In our case, this completely 
solved the performance issue.

I'm not sure whether this behaviour is a bug or whether joinedloads aren't 
intended to be reused. If the latter, it would be great if they raised a 
warning if reused like this (and a mention of this issue in the docs).

Thanks,
Tom

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

[sqlalchemy] Using the same joinedload object in multiple options causes performance issue with baked queries

Reply via email to