[sqlalchemy] Introducing SQLAlchemy-ORM-tree: a generic API for hierarchical data

Mark Friedenbach Tue, 28 Feb 2012 19:05:26 -0800

I'd like to introduce a package I've been working on for a little
while now: SQLAlchemy-ORM-tree, “an implementation for SQLAlchemy-
based applications of the nested-sets/modified-pre-order-tree-
traversal technique for storing hierarchical data in a relational
database.” It's gone through a couple of semi-public releases already,
but now for the first time the API is complete and semi-stable, and I
would like to get some feedback from developers on this list who have
used or would like to use hierarchical data in an RDBMS through
SQLAlchemy. Some example code to show it in action:



from sqlalchemy_tree import TreeManager
node_table = Table('node', metadata,
  Column('id', Integer, primary_key=True),
  Column('parent_id', Integer, ForeignKey('node.id')))
class Node(object):
  tree = TreeManager(node_table)
mapper(Node, node_table, properties={
  'parent': relationship(Node,
    backref     = backref('children'),
    remote_side = node_table.c.id),
})
Node.tree.register()

root = Node()
session.add(root)
session.flush()
child = Node()
child.parent = root
session.commit()

>>> child.tree.is_leaf_node
True
>>> child.tree.query_ancestors().one() == root
True
>>> child2 = Node()
>>> Node.tree.insert(child2, root, Node.tree.POSITION_LAST_CHILD)
>>> session.flush()
>>> root.tree.query_leaf_nodes() \
             .order_by(Node.tree) \
             .all() == [child, child2]
True
>>> child2.parent = None
>>> session.flush()
>>> child2.tree.is_root_node
True
>>>


The PyPI page:

http://pypi.python.org/pypi/SQLAlchemy-ORM-tree

and github (pull requests accepted):

https://github.com/rokusigma/sqlalchemy-orm-tree

Some things to note about the project as it is now: I am operating
under the principles of “1) make it work; 2) make it beautiful; then
finally 3) make it fast”. It works, and to prove it I've written over
300 unit tests in 3.4kLOC covering the expected behavior of all API
entry points. Now I'm soliciting the community's input in making it
“beautiful”--making sure the API is both functional and elegant,
refactoring areas of the code base that need it, making algorithmic
changes, and generally ensuring that the SQLAlchemy-ORM-tree package
is useful to anyone and everyone working with hierarchical data. I'll
then concern myself with low-level modifications for speed.

Some ugly warts I'm aware of (and reasons it's not yet 1.0):

* Making any changes to the structure of the tree requires flushing
before the various tree properties are updated (tree_id, left, right,
depth, etc.), as these updates are handled by a mapper before_flush
event handler and session (before/after)_(insert/update/delete) event
handlers. More subtlety, the introspective APIs `is_leaf_node`,
`is_descendant_of()`, etc. may possibly return incorrect results until
the session is flushed as they rely upon these tree property values.
I'm considering having `insert()` make changes to the database
directly and immediately without waiting for a flush event.

* There are no bulk insert, update, or delete operations. Within the
mapper event handlers each update, insert, and delete is handled in
turn, with all session objects updated accordingly at each stage. I
could use some input as to how best to structure a bulk-load/bulk-
update API from the user's perspective, as I lack experience in this
area.

* The tree manager needs access to the table at initialization time...
which in the declarative style means that the tree manager has to be
monkey-patched into the class object after it is defined. Suggestions
for improving this would be appreciated.

* The API for both inserting nodes AND updating the location of
existing nodes is `insert()`, the analogy being inserting into a
specific position in a list. It's confusing however as one uses it for
updates as well as insertions, and it also has nothing to do with the
SQL INSERT statement. `update()` or `move()` wouldn't be any better
for the same or similar reasons. Nomenclature suggestions would be
much appreciated.

* I just today learned about Tropashko's nested interval tree encoding
using continued fractions. I admit that I don't (yet) fully understand
the underlying maths, but appears that switching to continued
fractions over integers for the left/right fields could lead to
significant algorithmic gains in performance. Thankfully this could be
a transparent switch with no external API changes.

Happy hacking,
Mark

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

[sqlalchemy] Introducing SQLAlchemy-ORM-tree: a generic API for hierarchical data

Reply via email to