[sqlalchemy] Re: Hierachical data

Michael Bayer Fri, 03 Aug 2007 08:25:42 -0700

First of all, for your example here, youre just storing small  
subtrees of hierarchical data, and youre not trying to query complex  
set operations over the entire set of nodes.  Nested sets is  
completely disadvantageous in this case for its complexity, its  
incompatibility with row-based ORM tools like SQLAlchemy,  and its  
inability to handle modifications/inserts without table-spanning,  
concurrency-busting, UPDATE statements.  Nested sets does not support  
concurrent write operations at all.


The XML format you have is shallow so you can literally load the  
entire thing by self-joining a few times.  But in fact, since you are  
storing small sets of XML nodes which are keyed to a master ID for a  
particular set of nodes, you can just load the whole list of nodes  
based on your "playlist id".   This is a model for which nested sets  
is completely inappropriate; when you want to store many many small  
trees each of which you typically retrieve into memory at once, not a  
single giant tree from which you only ever view a partial.

The recipe for storing XML with SQLAlchemy, which allows searching  
for documents based on simplistic XPath expressions too (using a  
single query for the entire search/load operation), is here:

http://www.sqlalchemy.org/trac/browser/sqlalchemy/trunk/examples/ 
elementtree/optimized_al.py

theres also a version that uses "traditional" self joins to load, not  
as efficient but slightly simpler configuration-wise, which is here:

http://www.sqlalchemy.org/trac/browser/sqlalchemy/trunk/examples/ 
elementtree/adjacency_list.py

Note in 0.4 we can set "lazy=False" and "join_depth=3" or so, so that  
it self-joins to load three levels of nodes in one query.

There are lots of articles about "nested sets" and how they are  
"better" than adjacency list, but in practice ive never seen anyone  
actually *use* nested sets.  Also no discussion of nested sets is  
complete without what I feel is a much better recipe (but sadly,  
completely Oracle specific) which stores nodes within fractional  
ranges so that the mass UPDATEs arent needed:

http://www.dbazine.com/oracle/or-articles/tropashko4



On Aug 3, 2007, at 5:44 AM, Alexandre CONRAD wrote:

>
> Hello,
>
> this is more a general database question rather than a SA one, but  
> maybe
> someone has already achieve this with SA. I know there's a small  
> chapter
> about this already in the SA docs (1), but as far as I understood,  
> this
> is about the "Adjacency List" model and not about the "Nested Set"
> model. This article (2) explains about both approches and the "Nested
> Set" model would suite better my needs I think.
>
> Maybe this should need some attention to implement in SA some API to
> handle nodes (insert, move, remove) of herachical trees in SA the
> "Nested Set" way.
>
> Right now I have a playlist management application that is XML- 
> based. So
> a playlist has many nodes that either represent media elements  
> ("video",
> "image") or classification elements ("group"):
>
> <main>
>    <video>intro.avi</video>
>    <group datein="20070701" dateout="20070731">
>      <video>foo.avi</video>
>      <video>bar.avi</video>
>    </group>
>    <group>
>      <video>weather_intro.avi</video>
>      <image duration="5" start="0000" stop="1159">morning.jpg</image>
>      <image duration="5" start="1200" stop="2359">afternoon.jpg</ 
> image>
>      <video>brought_to_you_by.avi</video>
>      <media_list pick_style="random" pick_many="1">
>        <video>brand_A.avi</video>
>        <video>brand_B.avi</video>
>        <video dateout="20071231">brand_C.avi</video>
>      </media_list>
>    </group>
>    <video>outro.avi</video>
> </main>
>
> You can understand that the playlist has the ability to store  
> different
> types of medias (images, videos), some group elements used for
> classification, can apply some scheduling data to any elements as
> attributes.
>
> So I'm looking foward to integrate this hierarchical tree in a flat
> database.
>
> Basicly, I'd have a "playlist" to hold the main playlist:
>
> playlist_table = Table('playlists', meta,
>      Column('id', Integer, primary_key=True),
>      Column('name', Unicode(20), nullable=False),
>      Column('description', Unicode(40)),
> )
>
> Then an "nodes" table to store every type of nodes (group, video,  
> image)
> and some scheduling attributes.
>
> nodes_table = Table('nodes', meta,
>      Column('id', Integer, primary_key=True),
>      Column('typeof', Unicode(20), nullable=False),
>      Column('foreign_id', Integer),
>      Column('lft', Integer, nullable=False), # For "Nested Set" model
>      Column('rgt', Integer, nullable=False), # For "Nested Set" model
>      Column('scheduled_from', DateTime),
>      Column('scheduled_until', DateTime),
>      Column('id_playlist', None, ForeignKey('playlists.id')),
> )
>
> Every row would store "group", "video" or "image" in the 'typeof'  
> field
> to identify the type of node. And the id of the media would be in the
> 'foreign_id' field. 'typeof' and 'foreign_id' are the two field I  
> don't
> feel clean about.
>
> Lets finish with the "videos" table for the example:
>
> videos_table = Table('videos', meta,
>      Column('id', Integer, primary_key=True),
>      Column('file_name', Unicode(255), nullable=False),
>      Column('category', Unicode(255), nullable=False),
>      Column('duration', Integer),
>      Column('id_codec', None, ForeignKey('codecs.id')),
> )
>
> As the "nodes" table would need to point to different tables to  
> find the
> corresponding media, I couldn't have a foreign key for a natural
> relation. Or I would need to have as many foreign keys in the "nodes"
> table (id_video, id_image, ...) and only fill in the nedded column.  
> But
> this is not efficient.
>
> The whole idea is about browing down my tree through python objects
> attributes like SA already does so well when two tables are related to
> each other, but returning a different type of object depending on the
> kind of node I hit.
>
> class Node(object):
>      ...
> class Video(Node):
>      ...
> class Image(Node):
>      ...
>
> Any ideas one how I could deal with that would be greatly  
> appreciated. I
> hope this is clear enough to not scare people from reading it. At  
> least
> if you got there, it's a good point. :)
>
> ps: I just felt on the docs about "Concrete Inheritance, Polymorphic"
> (3), maybe this could be something to dig for ?
>
> (1)
> http://www.sqlalchemy.org/docs/ 
> documentation.html#advdatamapping_selfreferential
> (2)
> http://www.vbmysql.com/articles/database-design/managing- 
> hierarchical-data-in-mysql/
> (3)http://www.sqlalchemy.org/docs/ 
> documentation.html#advdatamapping_inheritance_concrete
>
> Regards,
> -- 
> Alexandre CONRAD
>
>
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: Hierachical data

Reply via email to