[sqlalchemy] Selecting from a self-referential mapper: recursive joins?

Eric Ongerth Mon, 24 Nov 2008 23:51:49 -0800

Below, I have attached a working testcase.  It works, yes -- but my
question is that I need to make an improved version of a particular
method on one of my classes.  The following model will probably
explain itself for the most part.  I'll let you read it first, then
offer a few explanatory notes afterward just in case.  Finally, at the
end, I will describe the difference between what the method in
question does now, and what I would like it to do.


The nature of the response I am seeking is: a description of what I
need to do to build a better version of the method I'm speaking of,
including any further insight on the practice of joining at multiple
levels of a recursive / self-referential (but loop-free) graph.


---snip---


from sqlalchemy import *
from sqlalchemy.sql import *
from sqlalchemy.orm import *

engine = create_engine('sqlite://')

metadata = MetaData(bind=engine)

itemtypes = Table('itemtypes', metadata,
    Column('name', Text, primary_key=True))

itemtype_inheritance = Table('itemtype_inheritance', metadata,
    Column('itemtype_name', Text, ForeignKey('itemtypes.name'),
primary_key=True),
    Column('parent_name', Text, ForeignKey('itemtypes.name'),
primary_key=True))

features = Table('features', metadata,
    Column('id', Integer, primary_key=True),
    Column('name', Text),
    Column('root_itemtype_name', Text, ForeignKey('itemtypes.name')))

feature_dependencies = Table('feature_dependencies', metadata,
    Column('dependent_id', Integer, ForeignKey('features.id'),
primary_key=True),
    Column('determinant_id', Integer, ForeignKey('features.id'),
primary_key=True))

metadata.drop_all()
metadata.create_all()

itemtypes.insert().execute([
    {'name': 'Product'},
    {'name': 'Footwear'},
    {'name': 'Boot'},
    {'name': 'Ski'}
])

itemtype_inheritance.insert().execute([
    {'itemtype_name': 'Footwear', 'parent_name': 'Product'},
    {'itemtype_name': 'Boot', 'parent_name': 'Footwear'},
    {'itemtype_name': 'Ski', 'parent_name': 'Product'}
])

features.insert().execute([
    {'id': 1, 'name': 'Manufacturer',
'root_itemtype_name':'Product' },
    {'id': 2, 'name': 'Model', 'root_itemtype_name':'Product' },
    {'id': 3, 'name': 'Year', 'root_itemtype_name':'Product' },
    {'id': 4, 'name': 'Gender', 'root_itemtype_name':'Footwear' },
    {'id': 5, 'name': 'US Shoe Size',
'root_itemtype_name':'Footwear' },
    {'id': 6, 'name': 'Length', 'root_itemtype_name':'Ski' },
    {'id': 7, 'name': 'Weight', 'root_itemtype_name':'Product' }

])

feature_dependencies.insert().execute([
    {'dependent_id': 7, 'determinant_id': 1},
    {'dependent_id': 7, 'determinant_id': 2},
    {'dependent_id': 7, 'determinant_id': 3},
    {'dependent_id': 7, 'determinant_id': 4},
    {'dependent_id': 7, 'determinant_id': 5},
    {'dependent_id': 7, 'determinant_id': 6}
])


class Itemtype(object):

    def __repr__(self):
        return 'Itemtype: %s' % (self.name)

    @property
    def inherited_features(self):
        return reduce(list.extend,
                      [base_itemtype.features for base_itemtype in
self.inherits],
                      [])

    @property
    def features(self):
        return self.own_features.extend(self.inherited_features)

    @property
    def dependent_features(self):
        return [f for f in self.features if f.determinants]

    @property
    def independent_features(self):
        return [f for f in self.features if not f.determinants]


class Feature(object):

    def __repr__(self):
        return '%s %s' % (self.root_itemtype_name, self.name)

    def determinants_in_scope_of(self, itemtype):
        return (session.query(Feature)
                .join(FeatureDependency.determinant)
                .join(Feature.root_itemtype)
                .filter(and_(FeatureDependency.dependent_id==self.id,
                             Itemtype.name==itemtype.name))).all()


class FeatureDependency(object):

    def __repr__(self):
        return "F_D: %s depends on %s" % (self.dependent.name,
                                          self.determinant.name)



mapper(Itemtype, itemtypes, properties={
    'inherits':relation(Itemtype,
        secondary=itemtype_inheritance,
        primaryjoin=
(itemtypes.c.name==itemtype_inheritance.c.itemtype_name),
        secondaryjoin=
(itemtype_inheritance.c.parent_name==itemtypes.c.name),
        backref='progeny'),
    'own_features':relation(Feature,
        primaryjoin=(features.c.root_itemtype_name==itemtypes.c.name),
        backref=backref('root_itemtype', uselist=False))
    })

mapper(Feature, features, properties={
    'dependents':relation(Feature,
        secondary=feature_dependencies,
        primaryjoin=
(feature_dependencies.c.determinant_id==features.c.id),
        secondaryjoin=
(feature_dependencies.c.dependent_id==features.c.id),
        backref=backref('determinants'))
    })

mapper(FeatureDependency, feature_dependencies, properties={
    'dependent':relation(Feature,
        uselist=False,
        primaryjoin=
(feature_dependencies.c.dependent_id==features.c.id),
        backref='feature_dependencies_as_dependent'),
    'determinant':relation(Feature,
        uselist=False,
        primaryjoin=
(feature_dependencies.c.determinant_id==features.c.id),
        backref='feature_dependencies_as_determinant')
    })



Session = sessionmaker(bind=engine)
session = Session()

Product = session.query(Itemtype).filter_by(name='Product').one()
Ski = session.query(Itemtype).filter_by(name='Ski').one()
Footwear = session.query(Itemtype).filter_by(name='Footwear').one()
Boot = session.query(Itemtype).filter_by(name='Boot').one()
Weight = session.query(Feature).filter_by(name='Weight').one()
Gender = session.query(Feature).filter_by(name='Gender').one()
Year = session.query(Feature).filter_by(name='Year').one()


---snip---


(to anyone who wants to play with this and doesn't know how, save the
above code as testcase.py, and then run it from shell with python -i
testcase.py; you then have a Python interpreter with all of the above
objects still active, including the sqlalchemy session, with the
ability to query and experiment).

Notes on what this is supposed to model: Here are some of the facts in
the problem domain.

Each Itemtype inherits from others above it, except for root itemtypes
(the only one of which in this example is Product).  The reason for
using an inheritance paradigm here is so that you can set Features on
any Itemtype and have them inherited by all of the subcategories (sub-
Itemtypes) of that Itemtype.  In other words, once you give the
itemtype called 'Product' a feature called 'Manufacturer' and a
feature called 'Model', now *everything* that inherits from Product
(in fact the whole rest of the graph, here) has those Features.

What you can't see here is that there is a similar structure holding
all the possible values of these features, and mapping combinations
thereof onto product codes and actual physical items.  This is
essentially a knowledge base for possible point-of-sale and inventory
management systems.  I'm leaving the "values" part out for the sake of
this simple example.

The reason for feature_dependencies: D.R.Y.  meaning: in the real
world if the value of one variable depends on a couple of others, you
don't need to re-record the dependent variable unless its determinants
(i.e. the variables upon which it depends) themselves change.  The
testcase structure above encodes dependency links between Features in
order to model the fact that, for example, the Weight of a Product is
a function of (is determined by) that Product's Model, Year, Size, and
so forth.  Better yet, these dependency relationships can easily be
seen as "cascading" or inheriting along the same paths as Itemtype
inheritance, so that feature dependencies need not be separately
entered/recorded at every node of the graph.  It is sufficient to
declare a feature dependency at its salient point -- meaning, at the
highest node at which the dependent feature and its determinants first
make contact -- and then simply allow the functional relationship
(feature dependency) to propagate usefully downward through the graph
to more and more specific Itemtypes.  This cascading process can pick
up additional feature dependencies as needed along the way, until
ultimately it reaches "terminal" itemtypes (i.e. leaves of the tree)
which are considerd fully described.  The production version will
include multiple inheritance because the real problem domain most
definitely does NOT resolve to a simple hierarchical tree; it's a more
general graph, though at least it has no loops (whew).

In this example case I have left all but one of the Features as
independent variables and set only one Feature, 'Weight', as a
dependent of several of the rest.  In actual production my terminal
itemtypes will have hundreds of Features storing a rich spread of data
about the Items that will instantiate those Itemtypes, with the bulk
of those features inherited from Itemtypes higher in the hierarchy --
the 'abstract base classes' in this system, if you will.

Anyway, just having Weight, alone, depend on several of the other
Features is enough to keep this test case simple and show what it
needs to show.

So, please observe the following results in a shell, which are as
expected/desired, and comments thereunder.

>>>Weight.determinants
[Product Manufacturer, Product Model, Product Year, Footwear Gender,
Footwear US Shoe Size, Ski Length]

comment: that's great, there we see all of the variables that
determine the Weight of something.  But it's a little confusing in
that form, because the Gender of and item of Footwear has nothing to
do with the Length of a Ski.  So we have a method on Feature to tune
in closer.

>>>Weight.determinants_in_scope_of(Product)
[Product Manufacturer, Product Model, Product Year]

comment: now we're getting somewhere.  Those, as requested, are the
determining factors for the Weight of an item, but only the ones that
were inherited from the abstract Product itemtype.

>>>Weight.determinants_in_scope_of(Footwear)
[Footwear Gender, Footwear US Shoe Size]

comment: that's great too.  We see that at the Footwear level, every
item that belongs to the Footwear branch of the tree (including Boot,
and its future descendants Ski Boot and Downhill Ski Boot, and so
forth) will have their instances' weights be a function of gender and
size -- without having to repeat myself by recording that fact over
and over again in the instances or even in those specific terminal
itemtypes.

However, here's where I'm coming up short, and the reason for this
whole post.
>>>Weight.determinants_in_scope_of(Boot)
[]

comment: That is as expected.  But it's not what I'm really looking
for.  What I would really rather have returned by this query is the
same determinant features reported for Footwear, because Footwear is
the parent of Boot and Boot should inherit them.  Furthermore, the
response to Weight.determinants_in_scope_of(Boot) should also include
the responses from Weight.determinants_in_scope_of(Product) because
Footwear inherits from Product, therefore so does Boot.

>>>Weight.determinants_in_scope_of(Ski)
[Ski Length]

comment: This is as expected.  But what I would really like to get
back is a result that reflects the inheritance of feature dependencies
from Product, the parent itemtype of Ski.  In other words, I would
rather have the answer to this query be [Ski Length, Product
Manufacturer, Product Model, Product Year].  I would NOT want
'Footwear Gender' to show up because that is on a different branch of
the tree; Ski does not have itemtype inheritance from Footwear, only
from Product. Nor, conversely, should Ski Length influence the Weight
of a Boot.

So that's my question, then.  How do I build a method similar to
determinants_in_scope_of() that follows on up the chain of itemtype
inheritance and returns the features on which a given Feature instance
is dependent (a.k.a. its 'determinants' in the language of this
particular model), all the way up to reach any root itemtypes no
matter how many levels of inheritance are involved?

I have been experimenting with the joins and filters in this method so
far...  Where I keep failing is when I try to figure out how to have
the query climb up through an arbitrary number of levels in the
Itemtype inheritance structure.

Thank you for your patience with this long post; If you've read this
far, I hope it has been interesting.  Hopefully Mike or others can
suggest a better set of joins or other techniques to tweak this
structure toward the behavior I'm aiming for.

Eric

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Selecting from a self-referential mapper: recursive joins?

Reply via email to