Greetings,
I recently started exploring the capabilities of couchdb and although I
find it really interesting and flexible, I am experiencing some
difficulties:
Is there any recommended way to store hierarchical data? Consider for
example the case of a file system with multiple directories. I can think
of some possible scenarios each with different capabilities and limitations:
* Each file and each folder is represented by a single document,
with each folder document containing a "contents" list that has the ids
of the subdocuments under the specific folder (the usual tree
structure). In this case, deleting a file would require updating more
than one document (the file for deletion and the parent folder for the
"contents" attribute) which seems dangerous considering the absence of
transactional operations (what about deleting a whole folder?).
Moreover, accessing the file "foo/bar/cow" would require a conventional
pathname translation which adds overhead (cut the pathname in chunks,
request the "foo" folder, retrieve the ids of its contents, find which
one corresponds to the "bar" folder etc..)
* Each file and each folder is represented by a single document,
with each file having an attribute "parent id" that contains the id of
its parent folder(reverse tree structure). In this case deleting the
file requires only one operation and seems more robust. However
pathname translation gets fuzzy and seems to add a lot of overhead
(retrieve id of folder, find documents having this "parent id"
attribute, find the one you want among them...)
* Each file is represented by a single document that has a "path"
attribute that indicates the directory that is being stored to. This
gives the advantage of avoiding conventional pathname translation and
retrieving the correct document immediately. However, operations such as
renaming a folder require updating many documents and should be avoided.
* Keep the whole file system in a single document. Ouch!
I am aware of the bulk update technique with the "all or nothing"
attribute, but it is to my understanding that it should be avoided,
especially when dealing with clustering and replication. In addition,
things seem to get more obscure when considering file sharing
possibilities between the users of the file system.
I would be glad if you could provide me some pointers on how to
circumvent the disadvantages of each of the methods above.
In general, do you thing that since dealing with documents is so
flexible and provided the absence of transactional operations one should
try to organize his data as decoupled as possible?
Thank you for your time ,
Andreas