Storing Hierarchical Data

Andreas Pavlogiannis Sun, 15 Nov 2009 19:14:47 -0800

Greetings,

I recently started exploring the capabilities of couchdb and although Ifind it really interesting and flexible, I am experiencing somedifficulties:

Is there any recommended way to store hierarchical data? Consider forexample the case of a file system with multiple directories. I can thinkof some possible scenarios each with different capabilities and limitations:* Each file and each folder is represented by a single document,with each folder document containing a "contents" list that has the idsof the subdocuments under the specific folder (the usual treestructure). In this case, deleting a file would require updating morethan one document (the file for deletion and the parent folder for the"contents" attribute) which seems dangerous considering the absence oftransactional operations (what about deleting a whole folder?).Moreover, accessing the file "foo/bar/cow" would require a conventionalpathname translation which adds overhead (cut the pathname in chunks,request the "foo" folder, retrieve the ids of its contents, find whichone corresponds to the "bar" folder etc..)* Each file and each folder is represented by a single document,with each file having an attribute "parent id" that contains the id ofits parent folder(reverse tree structure). In this case deleting thefile requires only one operation and seems more robust. Howeverpathname translation gets fuzzy and seems to add a lot of overhead(retrieve id of folder, find documents having this "parent id"attribute, find the one you want among them...)* Each file is represented by a single document that has a "path"attribute that indicates the directory that is being stored to. Thisgives the advantage of avoiding conventional pathname translation andretrieving the correct document immediately. However, operations such asrenaming a folder require updating many documents and should be avoided.

   * Keep the whole file system in a single document. Ouch!

I am aware of the bulk update technique with the "all or nothing"attribute, but it is to my understanding that it should be avoided,especially when dealing with clustering and replication. In addition,things seem to get more obscure when considering file sharingpossibilities between the users of the file system.

I would be glad if you could provide me some pointers on how tocircumvent the disadvantages of each of the methods above.

In general, do you thing that since dealing with documents is soflexible and provided the absence of transactional operations one shouldtry to organize his data as decoupled as possible?


Thank you for your time ,

Andreas

Storing Hierarchical Data

Reply via email to