[sqlalchemy] Re: Relationship operator vs join and filter

Wouter Overmeire Tue, 22 Mar 2011 00:37:41 -0700


On Mar 21, 11:42 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Mar 21, 2011, at 7:21 AM, Wouter Overmeire wrote:
>
>
>
>
>
>
>
>
>
> > New to sql and sqlalchemy.
> > Using an in memory sqlite database.
>
> > Trying to do some queries based on the examples in the sqlalchemy
> > documentation.
>
> > I have classes: Tag, Article, Author, Award
> > article <--> tag: many to many
> > article <--> author: many to many (so one article can have multiple
> > authors)
> > award <--> author: many to many
>
> > Find all articles written by 'author1'
> > articles =
> > session.query(Article).filter(Article.authors.any(Author.name ==
> > 'author1')).all()
> > This works fine, until i start increasing the number of articles and
> > authors.
> > For 50 authors with each 1000 articles and each article has 10 tags
> > (out of 100 tags in total), the above query takes forever, i never let
> > it complete and interrupt it after 15 minutes. The above query is just
> > as in the documentation (http://www.sqlalchemy.org/docs/orm/
> > tutorial.html?highlight=blogpost#building-a-many-to-many-
> > relationship), is it suppose to work on larger tables? Very likely i`m
> > doing something wrong.
>
> Not really, the EXISTS format is known to be generally slower than a straight 
> INNER JOIN.   The operator is convenient, especially when doing a NOT EXISTS, 
> but if looking for positive inclusion of sets the more verbose join() 
> approach is typically more efficient.
>
>
>
>
>
>
>
>
>
>
>
> > If i do:
> > articles =
> > session.query(Article).join(Article.authors).filter(Author.name ==
> > 'author1').all()  this is done in a fraction of a second.
>
> > Now trying to find articles written by 'author1' and having tag
> > 'tag1'.
> > qArticleByTag =
> > session.query(Article).join(Article.tags).filter(Tag.tag == 'tag1')
> > qArticleByAuthor =
> > session.query(Article).join(Article.authors).filter(Author.name ==
> > 'author1')
> > articles = qArticleByTag.intersect(qArticleByAuthor).all()
>
> > This works fine, but somehow i have the feeling there are more
> > efficient ways.
>
> the SQL set operators like UNION, INTERSECT, etc. are not the most efficient 
> and are cumbersome to use.      Using two joins() at once would be more 
> straightforward:
>
> session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1')
>
> three sets of rows are intersected above, returning only those rows which are 
> correlated and meet all criterion.
>
> > Since i think the above query goes twice through all
> > the articles.
>
> with INTERSECT, probably, though its possible some query optimizers could 
> figure out a more efficient pattern here (I'm not a deep expert in query 
> optimizers to know for sure, however).  Sticking with chains of joins is the 
> way to go in any case.



Before using the intersect i tried also the query you suggest but it
takes very long to complete, compared to the intersect.
articles =
session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1').all()
takes 75seconds.

qArticleByTag =
session.query(Article).join(Article.tags).filter(Tag.tag == 'tag1')
qArticleByAuthor =
session.query(Article).join(Article.authors).filter(Author.name ==
'author1')
articles = qArticleByTag.intersect(qArticleByAuthor).all() takes 0.8
seconds

That`s almost two decades of difference in speed for the same result!
This i don`t understand, i would expect the chain of joins to be
faster, since the join with Article.authors is on a smaller table
compared to qArticleByAuthor and no intersect is needed.
Could it be the chain of join is making a monster table first joining
all tags and then joining all authors and from this table select the
ones with tag='tag1' and author='author1'. Instead of doing the join
of Article.authors on the filtered articles table (by tag)
>>> print 
>>> session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1')
SELECT "tArticles".id AS "tArticles_id", "tArticles".txt AS
"tArticles_txt"
FROM "tArticles" JOIN "tArticleTags" AS "tArticleTags_1" ON
"tArticles".id = "tArticleTags_1"."articleId" JOIN "tTags" ON
"tTags".id = "tArticleTags_1"."tagId" JOIN "tAuthorArticles" AS
"tAuthorArticles_1" ON "tArticles".id =
"tAuthorArticles_1"."articleId" JOIN "tAuthors" ON "tAuthors".id =
"tAuthorArticles_1"."authorId"
WHERE "tTags".tag = ? AND "tAuthors".name = ?





-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

[sqlalchemy] Re: Relationship operator vs join and filter

Reply via email to