On Thursday 28 August 2008 18:00:08 Michael Bayer wrote: > On Aug 28, 2008, at 10:54 AM, Cecil Westerhof wrote: > >> On the Query side, the basic job is to formulate joins to the > >> parent, > > > > Would it no be better to the child? Otherwise you need to > > traverse all records, which would be inefficient -I think- when > > for example only 1% of the records are in the group. Or am I > > overlooking something? > > its all joins between X and Y so at that level its the same thing. > The optimizations you'd be looking for here would involve > additional tables that store information such as a full path for > each node, or a materialized path view of some kind (or even, > nested sets, something I should eventually create an ORM example > for since it's tricky).
cecil, think well about which point of view is more important in the means of speed vs number-of-records vs needed-queries, and just then apply specific technics to denormalize the tree. e.g. nested sets (where an item knows all its children recursively - like someclass.__subclass__() ) might be good for answering "which Rs belong to a group", while path-to-the-root (like someclass.mro()) might be better in other cases - e.g. which groups some R belongs to (these are just guesses, dont rely on them being 100% bcorrect). as Mike said, there are two sides of the equatoin, one is the object graph as such, another is how u represent it in db/queries over that. the representation can be very twisted towards some sort of queries - e.g. one extreme may be to store/"cache" all inherited-result groups for each R - and refresh the "cache" at any change anywhere on the path-to-root.. ("cache" here is actualy some records inside db) in my case i've decided that getting all objects + their assoc.values in one go/query and then doing "inheritance" in python is better than all else, like above "cache" or walking the tree in python and making many small queries to get values, for two reasons: thousands of rows and the order/priority of inheritance is changeable. i have not decided myself how to derecurse the tree yet, i'm making an N-level generated query. The results so far are that for self-recursive query (A1-A2-A3), 4 levels are enough to kill postgres (2 for sqlite). for a non-self-recursive (A1-B1-C3), 3,4,5,6 levels makes no much difference. also, if u need count over above, query.count() may not be correct, use this instead (i dont know why but count(1) is not correct over a filtered cartesian product): def exact_count( query): from sqlalchemy.sql import func m = getattr( query, 'mapper', None) #0.4 if m is None: m = query._mapper_zero() #0.5 #see Query.count() return query._col_aggregate( m.primary_key[0], lambda x: func.count( func.distinct(x)) ) mike, why not have some query._count_expr() method that defaults to count(1) but one can easily subst something else, e.g. the above count(distinct(x.dbid)) ? now i have to subclass Query and replace the whole .count() with above which is partialy copypaste from original Query.count() (and probably has too much internal stuff) have fun svil --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---