On 16 Nov 2009, at 05:28, Adam Wolff wrote: > There isn't a great way to store hierarchical data in couch. If you want to > actually move stuff around, the full pathname is a no-go, since there are no > bulk updates. The only other trick here, if you have meaningful roots or > branch points, is to store a reference to those in addition to the specific > parent node in the graph.
It is not a no-go, renames just can't be atomic :) Cheers Jan -- > > In any case, it seems better to me to store references from child to parent, > rather than the other way around. The child document makes a more natural > concurrency boundary. > > > A > > > On Sun, Nov 15, 2009 at 7:14 PM, Andreas Pavlogiannis < > [email protected]> wrote: > >> Greetings, >> >> I recently started exploring the capabilities of couchdb and although I >> find it really interesting and flexible, I am experiencing some >> difficulties: >> >> Is there any recommended way to store hierarchical data? Consider for >> example the case of a file system with multiple directories. I can think of >> some possible scenarios each with different capabilities and limitations: >> * Each file and each folder is represented by a single document, with >> each folder document containing a "contents" list that has the ids of the >> subdocuments under the specific folder (the usual tree structure). In this >> case, deleting a file would require updating more than one document (the >> file for deletion and the parent folder for the "contents" attribute) which >> seems dangerous considering the absence of transactional operations (what >> about deleting a whole folder?). Moreover, accessing the file "foo/bar/cow" >> would require a conventional pathname translation which adds overhead (cut >> the pathname in chunks, request the "foo" folder, retrieve the ids of its >> contents, find which one corresponds to the "bar" folder etc..) >> * Each file and each folder is represented by a single document, with >> each file having an attribute "parent id" that contains the id of its parent >> folder(reverse tree structure). In this case deleting the file requires only >> one operation and seems more robust. However pathname translation gets >> fuzzy and seems to add a lot of overhead (retrieve id of folder, find >> documents having this "parent id" attribute, find the one you want among >> them...) >> * Each file is represented by a single document that has a "path" >> attribute that indicates the directory that is being stored to. This gives >> the advantage of avoiding conventional pathname translation and retrieving >> the correct document immediately. However, operations such as renaming a >> folder require updating many documents and should be avoided. >> * Keep the whole file system in a single document. Ouch! >> >> I am aware of the bulk update technique with the "all or nothing" >> attribute, but it is to my understanding that it should be avoided, >> especially when dealing with clustering and replication. In addition, things >> seem to get more obscure when considering file sharing possibilities between >> the users of the file system. >> >> I would be glad if you could provide me some pointers on how to circumvent >> the disadvantages of each of the methods above. >> >> In general, do you thing that since dealing with documents is so flexible >> and provided the absence of transactional operations one should try to >> organize his data as decoupled as possible? >> >> Thank you for your time , >> >> Andreas >>
