On Thursday 28 August 2008 18:00:08 Michael Bayer wrote:
> On Aug 28, 2008, at 10:54 AM, Cecil Westerhof wrote:
> >> On the Query side, the basic job is to formulate joins to the
> >> parent,
> >
> > Would it no be better to the child? Otherwise you need to
> > traverse all records, which would be inefficient -I think- when
> > for example only 1% of the records are in the group. Or am I
> > overlooking something?
>
> its all joins between X and Y so at that level its the same thing.
> The optimizations you'd be looking for here would involve
> additional tables that store information such as a full path for
> each node, or a materialized path view of some kind (or even,
> nested sets, something I should eventually create an ORM example
> for since it's tricky).

cecil, think well about which point of view is more important in the 
means of speed vs number-of-records vs needed-queries, and just then 
apply specific technics to denormalize the tree.
e.g. nested sets (where an item knows all its children recursively - 
like someclass.__subclass__() ) might be good for answering "which Rs 
belong to a group", while path-to-the-root (like someclass.mro()) 
might be better in other cases - e.g. which groups some R belongs to 
(these are just guesses, dont rely on them being 100% bcorrect).

as Mike said, there are two sides of the equatoin, one is the object 
graph as such, another is how u represent it in db/queries over that. 
the representation can be very twisted towards some sort of queries - 
e.g. one extreme may be to store/"cache" all inherited-result groups 
for each R - and refresh the "cache" at any change anywhere on the 
path-to-root.. ("cache" here is actualy some records inside db)

in my case i've decided that getting all objects + their assoc.values 
in one go/query and then doing "inheritance" in python is better than 
all else, like above "cache" or walking the tree in python and making 
many small queries to get values, for two reasons: thousands of rows 
and the order/priority of inheritance is changeable. i have not 
decided myself how to derecurse the tree yet, i'm making an N-level 
generated query. The results so far are that for self-recursive query 
(A1-A2-A3), 4 levels are enough to kill postgres (2 for sqlite). for 
a non-self-recursive (A1-B1-C3), 3,4,5,6 levels makes no much 
difference.

also, if u need count over above, query.count() may not be correct, 
use this instead (i dont know why but count(1) is not correct over a 
filtered cartesian product):

def exact_count( query):
    from sqlalchemy.sql import func
    m = getattr( query, 'mapper', None)     #0.4
    if m is None: m = query._mapper_zero()  #0.5
    #see Query.count()
    return query._col_aggregate( m.primary_key[0],
                lambda x: func.count( func.distinct(x)) )

mike, why not have some query._count_expr() method that defaults to 
count(1) but one can easily subst something else, e.g. the above 
count(distinct(x.dbid)) ?
now i have to subclass Query and replace the whole .count() with above 
which is partialy copypaste from original Query.count() (and probably 
has too much internal stuff)

have fun
svil

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to