On Mar 22, 8:35 am, Wouter Overmeire <wouter.overme...@scarlet.be>
wrote:
> On Mar 21, 11:42 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Mar 21, 2011, at 7:21 AM, Wouter Overmeire wrote:
>
> > > New to sql and sqlalchemy.
> > > Using an in memory sqlite database.
>
> > > Trying to do some queries based on the examples in the sqlalchemy
> > > documentation.
>
> > > I have classes: Tag, Article, Author, Award
> > > article <--> tag: many to many
> > > article <--> author: many to many (so one article can have multiple
> > > authors)
> > > award <--> author: many to many
>
> > > Find all articles written by 'author1'
> > > articles =
> > > session.query(Article).filter(Article.authors.any(Author.name ==
> > > 'author1')).all()
> > > This works fine, until i start increasing the number of articles and
> > > authors.
> > > For 50 authors with each 1000 articles and each article has 10 tags
> > > (out of 100 tags in total), the above query takes forever, i never let
> > > it complete and interrupt it after 15 minutes. The above query is just
> > > as in the documentation (http://www.sqlalchemy.org/docs/orm/
> > > tutorial.html?highlight=blogpost#building-a-many-to-many-
> > > relationship), is it suppose to work on larger tables? Very likely i`m
> > > doing something wrong.
>
> > Not really, the EXISTS format is known to be generally slower than a 
> > straight INNER JOIN.   The operator is convenient, especially when doing a 
> > NOT EXISTS, but if looking for positive inclusion of sets the more verbose 
> > join() approach is typically more efficient.
>
> > > If i do:
> > > articles =
> > > session.query(Article).join(Article.authors).filter(Author.name ==
> > > 'author1').all()  this is done in a fraction of a second.
>
> > > Now trying to find articles written by 'author1' and having tag
> > > 'tag1'.
> > > qArticleByTag =
> > > session.query(Article).join(Article.tags).filter(Tag.tag == 'tag1')
> > > qArticleByAuthor =
> > > session.query(Article).join(Article.authors).filter(Author.name ==
> > > 'author1')
> > > articles = qArticleByTag.intersect(qArticleByAuthor).all()
>
> > > This works fine, but somehow i have the feeling there are more
> > > efficient ways.
>
> > the SQL set operators like UNION, INTERSECT, etc. are not the most 
> > efficient and are cumbersome to use.      Using two joins() at once would 
> > be more straightforward:
>
> > session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1')
>
> > three sets of rows are intersected above, returning only those rows which 
> > are correlated and meet all criterion.
>
> > > Since i think the above query goes twice through all
> > > the articles.
>
> > with INTERSECT, probably, though its possible some query optimizers could 
> > figure out a more efficient pattern here (I'm not a deep expert in query 
> > optimizers to know for sure, however).  Sticking with chains of joins is 
> > the way to go in any case.
>
> Before using the intersect i tried also the query you suggest but it
> takes very long to complete, compared to the intersect.
> articles =
> session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1').all()
> takes 75seconds.
>
> qArticleByTag =
> session.query(Article).join(Article.tags).filter(Tag.tag == 'tag1')
> qArticleByAuthor =
> session.query(Article).join(Article.authors).filter(Author.name ==
> 'author1')
> articles = qArticleByTag.intersect(qArticleByAuthor).all() takes 0.8
> seconds
>
> That`s almost two decades of difference in speed for the same result!
> This i don`t understand, i would expect the chain of joins to be
> faster, since the join with Article.authors is on a smaller table
> compared to qArticleByAuthor and no intersect is needed.
> Could it be the chain of join is making a monster table first joining
> all tags and then joining all authors and from this table select the
> ones with tag='tag1' and author='author1'. Instead of doing the join
> of Article.authors on the filtered articles table (by tag)>>> print 
> session.query(Article).join(Article.tags).filter(Tag.tag=='tag1').join(Article.authors).filter(Author.name=='author1')
>
> SELECT "tArticles".id AS "tArticles_id", "tArticles".txt AS
> "tArticles_txt"
> FROM "tArticles" JOIN "tArticleTags" AS "tArticleTags_1" ON
> "tArticles".id = "tArticleTags_1"."articleId" JOIN "tTags" ON
> "tTags".id = "tArticleTags_1"."tagId" JOIN "tAuthorArticles" AS
> "tAuthorArticles_1" ON "tArticles".id =
> "tAuthorArticles_1"."articleId" JOIN "tAuthors" ON "tAuthors".id =
> "tAuthorArticles_1"."authorId"
> WHERE "tTags".tag = ? AND "tAuthors".name = ?

Anyway this is SQL related, and not SA.
I switched from sqlite to mysql and rebuild the database.
The query that took 75second using sqlite only takes 0.04 (and i type
again 0.04) seconds using mysql as database. This was done on the same
machine. (Thanks to sqlalchemy i could reuse my code :-) )
So bye bye sqlite and welcome mysql.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to