On Mar 21, 11:42 pm, Michael Bayer <mike...@zzzcomputing.com> wrote: > On Mar 21, 2011, at 7:21 AM, Wouter Overmeire wrote: > > > > > > > > > > > New to sql and sqlalchemy. > > Using an in memory sqlite database. > > > Trying to do some queries based on the examples in the sqlalchemy > > documentation. > > > I have classes: Tag, Article, Author, Award > > article <--> tag: many to many > > article <--> author: many to many (so one article can have multiple > > authors) > > award <--> author: many to many > > > Find all articles written by 'author1' > > articles = > > session.query(Article).filter(Article.authors.any(Author.name == > > 'author1')).all() > > This works fine, until i start increasing the number of articles and > > authors. > > For 50 authors with each 1000 articles and each article has 10 tags > > (out of 100 tags in total), the above query takes forever, i never let > > it complete and interrupt it after 15 minutes. The above query is just > > as in the documentation (http://www.sqlalchemy.org/docs/orm/ > > tutorial.html?highlight=blogpost#building-a-many-to-many- > > relationship), is it suppose to work on larger tables? Very likely i`m > > doing something wrong. > > Not really, the EXISTS format is known to be generally slower than a straight > INNER JOIN. The operator is convenient, especially when doing a NOT EXISTS, > but if looking for positive inclusion of sets the more verbose join() > approach is typically more efficient. > > > > > > > > > > > > > If i do: > > articles = > > session.query(Article).join(Article.authors).filter(Author.name == > > 'author1').all() this is done in a fraction of a second. > > > Now trying to find articles written by 'author1' and having tag > > 'tag1'. > > qArticleByTag = > > session.query(Article).join(Article.tags).filter(Tag.tag == 'tag1') > > qArticleByAuthor = > > session.query(Article).join(Article.authors).filter(Author.name == > > 'author1') > > articles = qArticleByTag.intersect(qArticleByAuthor).all() > > > This works fine, but somehow i have the feeling there are more > > efficient ways. > > the SQL set operators like UNION, INTERSECT, etc. are not the most efficient > and are cumbersome to use. Using two joins() at once would be more > straightforward: > > session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1') > > three sets of rows are intersected above, returning only those rows which are > correlated and meet all criterion. > > > Since i think the above query goes twice through all > > the articles. > > with INTERSECT, probably, though its possible some query optimizers could > figure out a more efficient pattern here (I'm not a deep expert in query > optimizers to know for sure, however). Sticking with chains of joins is the > way to go in any case.
Before using the intersect i tried also the query you suggest but it takes very long to complete, compared to the intersect. articles = session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1').all() takes 75seconds. qArticleByTag = session.query(Article).join(Article.tags).filter(Tag.tag == 'tag1') qArticleByAuthor = session.query(Article).join(Article.authors).filter(Author.name == 'author1') articles = qArticleByTag.intersect(qArticleByAuthor).all() takes 0.8 seconds That`s almost two decades of difference in speed for the same result! This i don`t understand, i would expect the chain of joins to be faster, since the join with Article.authors is on a smaller table compared to qArticleByAuthor and no intersect is needed. Could it be the chain of join is making a monster table first joining all tags and then joining all authors and from this table select the ones with tag='tag1' and author='author1'. Instead of doing the join of Article.authors on the filtered articles table (by tag) >>> print >>> session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1') SELECT "tArticles".id AS "tArticles_id", "tArticles".txt AS "tArticles_txt" FROM "tArticles" JOIN "tArticleTags" AS "tArticleTags_1" ON "tArticles".id = "tArticleTags_1"."articleId" JOIN "tTags" ON "tTags".id = "tArticleTags_1"."tagId" JOIN "tAuthorArticles" AS "tAuthorArticles_1" ON "tArticles".id = "tAuthorArticles_1"."articleId" JOIN "tAuthors" ON "tAuthors".id = "tAuthorArticles_1"."authorId" WHERE "tTags".tag = ? AND "tAuthors".name = ? -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.