Re: [sqlalchemy] Relationship operator vs join and filter

Michael Bayer Mon, 21 Mar 2011 15:42:53 -0700

On Mar 21, 2011, at 7:21 AM, Wouter Overmeire wrote:

> New to sql and sqlalchemy.
> Using an in memory sqlite database.
> 
> Trying to do some queries based on the examples in the sqlalchemy
> documentation.
> 
> I have classes: Tag, Article, Author, Award
> article <--> tag: many to many
> article <--> author: many to many (so one article can have multiple
> authors)
> award <--> author: many to many
> 
> Find all articles written by 'author1'
> articles =
> session.query(Article).filter(Article.authors.any(Author.name ==
> 'author1')).all()
> This works fine, until i start increasing the number of articles and
> authors.
> For 50 authors with each 1000 articles and each article has 10 tags
> (out of 100 tags in total), the above query takes forever, i never let
> it complete and interrupt it after 15 minutes. The above query is just
> as in the documentation (http://www.sqlalchemy.org/docs/orm/
> tutorial.html?highlight=blogpost#building-a-many-to-many-
> relationship), is it suppose to work on larger tables? Very likely i`m
> doing something wrong.


Not really, the EXISTS format is known to be generally slower than a straight 
INNER JOIN.   The operator is convenient, especially when doing a NOT EXISTS, 
but if looking for positive inclusion of sets the more verbose join() approach 
is typically more efficient.

> 
> If i do:
> articles =
> session.query(Article).join(Article.authors).filter(Author.name ==
> 'author1').all()  this is done in a fraction of a second.
> 
> Now trying to find articles written by 'author1' and having tag
> 'tag1'.
> qArticleByTag =
> session.query(Article).join(Article.tags).filter(Tag.tag == 'tag1')
> qArticleByAuthor =
> session.query(Article).join(Article.authors).filter(Author.name ==
> 'author1')
> articles = qArticleByTag.intersect(qArticleByAuthor).all()
> 
> This works fine, but somehow i have the feeling there are more
> efficient ways.

the SQL set operators like UNION, INTERSECT, etc. are not the most efficient 
and are cumbersome to use.      Using two joins() at once would be more 
straightforward:

session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1')

three sets of rows are intersected above, returning only those rows which are 
correlated and meet all criterion.

> Since i think the above query goes twice through all
> the articles.

with INTERSECT, probably, though its possible some query optimizers could 
figure out a more efficient pattern here (I'm not a deep expert in query 
optimizers to know for sure, however).  Sticking with chains of joins is the 
way to go in any case.


-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Relationship operator vs join and filter

Reply via email to