On Mar 22, 8:35 am, Wouter Overmeire <wouter.overme...@scarlet.be> wrote: > On Mar 21, 11:42 pm, Michael Bayer <mike...@zzzcomputing.com> wrote: > > > > > > > > > > > On Mar 21, 2011, at 7:21 AM, Wouter Overmeire wrote: > > > > New to sql and sqlalchemy. > > > Using an in memory sqlite database. > > > > Trying to do some queries based on the examples in the sqlalchemy > > > documentation. > > > > I have classes: Tag, Article, Author, Award > > > article <--> tag: many to many > > > article <--> author: many to many (so one article can have multiple > > > authors) > > > award <--> author: many to many > > > > Find all articles written by 'author1' > > > articles = > > > session.query(Article).filter(Article.authors.any(Author.name == > > > 'author1')).all() > > > This works fine, until i start increasing the number of articles and > > > authors. > > > For 50 authors with each 1000 articles and each article has 10 tags > > > (out of 100 tags in total), the above query takes forever, i never let > > > it complete and interrupt it after 15 minutes. The above query is just > > > as in the documentation (http://www.sqlalchemy.org/docs/orm/ > > > tutorial.html?highlight=blogpost#building-a-many-to-many- > > > relationship), is it suppose to work on larger tables? Very likely i`m > > > doing something wrong. > > > Not really, the EXISTS format is known to be generally slower than a > > straight INNER JOIN. The operator is convenient, especially when doing a > > NOT EXISTS, but if looking for positive inclusion of sets the more verbose > > join() approach is typically more efficient. > > > > If i do: > > > articles = > > > session.query(Article).join(Article.authors).filter(Author.name == > > > 'author1').all() this is done in a fraction of a second. > > > > Now trying to find articles written by 'author1' and having tag > > > 'tag1'. > > > qArticleByTag = > > > session.query(Article).join(Article.tags).filter(Tag.tag == 'tag1') > > > qArticleByAuthor = > > > session.query(Article).join(Article.authors).filter(Author.name == > > > 'author1') > > > articles = qArticleByTag.intersect(qArticleByAuthor).all() > > > > This works fine, but somehow i have the feeling there are more > > > efficient ways. > > > the SQL set operators like UNION, INTERSECT, etc. are not the most > > efficient and are cumbersome to use. Using two joins() at once would > > be more straightforward: > > > session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1') > > > three sets of rows are intersected above, returning only those rows which > > are correlated and meet all criterion. > > > > Since i think the above query goes twice through all > > > the articles. > > > with INTERSECT, probably, though its possible some query optimizers could > > figure out a more efficient pattern here (I'm not a deep expert in query > > optimizers to know for sure, however). Sticking with chains of joins is > > the way to go in any case. > > Before using the intersect i tried also the query you suggest but it > takes very long to complete, compared to the intersect. > articles = > session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1').all() > takes 75seconds. > > qArticleByTag = > session.query(Article).join(Article.tags).filter(Tag.tag == 'tag1') > qArticleByAuthor = > session.query(Article).join(Article.authors).filter(Author.name == > 'author1') > articles = qArticleByTag.intersect(qArticleByAuthor).all() takes 0.8 > seconds > > That`s almost two decades of difference in speed for the same result! > This i don`t understand, i would expect the chain of joins to be > faster, since the join with Article.authors is on a smaller table > compared to qArticleByAuthor and no intersect is needed. > Could it be the chain of join is making a monster table first joining > all tags and then joining all authors and from this table select the > ones with tag='tag1' and author='author1'. Instead of doing the join > of Article.authors on the filtered articles table (by tag)>>> print > session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1') > > SELECT "tArticles".id AS "tArticles_id", "tArticles".txt AS > "tArticles_txt" > FROM "tArticles" JOIN "tArticleTags" AS "tArticleTags_1" ON > "tArticles".id = "tArticleTags_1"."articleId" JOIN "tTags" ON > "tTags".id = "tArticleTags_1"."tagId" JOIN "tAuthorArticles" AS > "tAuthorArticles_1" ON "tArticles".id = > "tAuthorArticles_1"."articleId" JOIN "tAuthors" ON "tAuthors".id = > "tAuthorArticles_1"."authorId" > WHERE "tTags".tag = ? AND "tAuthors".name = ?
Anyway this is SQL related, and not SA. I switched from sqlite to mysql and rebuild the database. The query that took 75second using sqlite only takes 0.04 (and i type again 0.04) seconds using mysql as database. This was done on the same machine. (Thanks to sqlalchemy i could reuse my code :-) ) So bye bye sqlite and welcome mysql. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.