On Wed, Jun 6, 2018 at 6:53 AM, Tom Flannaghan <tomflannag...@gmail.com> wrote: > Hi all, > > We have just upgraded to sqlalchemy 1.2.7 (from 1.1.14), and had a > performance issue with a query that uses a lot of joinedloads that was > caused by the automatic baking of all relationship queries that was > introduced in 1.2. > > Say we have a set of tables with relationships Book.pages, Page.font and > Font.layout. We have a query of this form: > > pages = joinedload(Book.pages) > option1 = pages.joinedload(Page.font) > option2 = pages.joinedload(Page.layout) > > query = session().query(Book).options(option1, option2) > > The important point here is that the pages object defined on line 1 is > reused in both option1 and option2. Now suppose we fetch another > relationship that wasn't joined-loaded on the returned instances. This will > case another query as it is not loaded already, and this query will be baked > due to the change in 1.2 to bake all relationship loads. > > We found that the construction of the cache key for baking this query > becomes very slow as the number of options of this form increases, and is in > fact quadratic in the number of such options (we have ~25 such options in > our problematic query). This is due to each option containing all of the > joinedloads inside its _to_bind attribute, and > _UnboundLoad._generate_cache_key has to process everything in the _to_bind > list. E.g. in this example: > > print([[str(i) for i in load.path] for load in option1._to_bind]) > print([[str(i) for i in load.path] for load in option2._to_bind]) > > [['Book.pages'], ['Book.pages', 'Page.font'], ['Book.pages', 'Page.layout']] > > [['Book.pages'], ['Book.pages', 'Page.font'], ['Book.pages', 'Page.layout']] > > > Therefore, when generating the key for each option we are processing the > joinedloads from all of the options, leading to the quadratic performance > degradation. > > We fixed it by avoiding reusing the joinedload for Book.pages by doing this: > > option1 = joinedload(Book.pages).joinedload(Page.font) > option2 = joinedload(Book.pages).joinedload(Page.layout) > > The resulting query is unchanged, but the cache key function is now just > linear in the number of joinedloads as each option has only its > relationships in its _to_bind attribute. In our case, this completely solved > the performance issue. > > I'm not sure whether this behaviour is a bug or whether joinedloads aren't > intended to be reused. If the latter, it would be great if they raised a > warning if reused like this (and a mention of this issue in the docs).
I'd classify this as a bug which should be fixed. I would need to find time to dig into the structure of the options objects but it seems as though if the options themselves copied out to a new option, or the cache key could be smarter about what it's doing. I don't understand exactly what you've observed yet but it sounds like in the re-use case it is spending a lot of time with redundant data that isn't needed. this is up at https://bitbucket.org/zzzeek/sqlalchemy/issues/4270/cache-key-calc-for-loader-options-w-reuse > > Thanks, > Tom > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.