Ok, for some apps it's a no-go. If this is a highly concurrent server app, you'll orphan data if you start two rename updates at the same time,
a On Monday, November 16, 2009, Jan Lehnardt <[email protected]> wrote: > > On 16 Nov 2009, at 05:28, Adam Wolff wrote: > >> There isn't a great way to store hierarchical data in couch. If you want to >> actually move stuff around, the full pathname is a no-go, since there are no >> bulk updates. The only other trick here, if you have meaningful roots or >> branch points, is to store a reference to those in addition to the specific >> parent node in the graph. > > It is not a no-go, renames just can't be atomic :) > > Cheers > Jan > -- > >> >> In any case, it seems better to me to store references from child to parent, >> rather than the other way around. The child document makes a more natural >> concurrency boundary. >> >> >> A >> >> >> On Sun, Nov 15, 2009 at 7:14 PM, Andreas Pavlogiannis < >> [email protected]> wrote: >> >>> Greetings, >>> >>> I recently started exploring the capabilities of couchdb and although I >>> find it really interesting and flexible, I am experiencing some >>> difficulties: >>> >>> Is there any recommended way to store hierarchical data? Consider for >>> example the case of a file system with multiple directories. I can think of >>> some possible scenarios each with different capabilities and limitations: >>> * Each file and each folder is represented by a single document, with >>> each folder document containing a "contents" list that has the ids of the >>> subdocuments under the specific folder (the usual tree structure). In this >>> case, deleting a file would require updating more than one document (the >>> file for deletion and the parent folder for the "contents" attribute) which >>> seems dangerous considering the absence of transactional operations (what >>> about deleting a whole folder?). Moreover, accessing the file "foo/bar/cow" >>> would require a conventional pathname translation which adds overhead (cut >>> the pathname in chunks, request the "foo" folder, retrieve the ids of its >>> contents, find which one corresponds to the "bar" folder etc..) >>> * Each file and each folder is represented by a single document, with >>> each file having an attribute "parent id" that contains the id of its parent >>> folder(reverse tree structure). In this case deleting the file requires only >>> one operation and seems more robust. However pathname translation gets >>> fuzzy and seems to add a lot of overhead (retrieve id of folder, find >>> documents having this "parent id" attribute, find the one you want among >>> them...) >>> * Each file is represented by a single document that has a "path" >>> attribute that indicates the directory that is being stored to. This gives >>> the advantage of avoiding conventional pathname translation and retrieving >>> the correct document immediately. However, operations such as renaming a >>> folder require updating many documents and should be avoided. >>> * Keep the whole file system in a single document. Ouch! >>> >>> I am aware of the bulk update technique with the "all or nothing" >>> attribute, but it is to my understanding that it should be avoided, >>> especially when dealing with clustering and replication. In addition, things >>> seem to get more obscure when considering file sharing possibilities between >>> the users of the file system. >>> >>> I would be glad if you could provide me some pointers on how to circumvent >>> the disadvantages of each of the methods above. >>> >>> In general, do you thing that since dealing with documents is so flexible >>> and provided the absence of transactional operations one should try to >>> organize his data as decoupled as possible? >>> >>> Thank you for your time , >>> >>> Andreas >>> > >
