On Sun, Apr 19, 2009 at 09:35:40AM -0400, Jan Lehnardt wrote: >> Does this really mean that if we have, for one customer, 100,000 >> CouchDB "documents" consisting each of minimal meta-data, but with >> each an attachment that is on average a 3 MB PDF file, that this is >> all stored in one single 300 GB file ? >> >> 1a. >> If yes, is that not uncomfortable/scary ? > > Yes, this is correct. > >> (I mean, even nowadays, moving a 300 GB file is not the easiest >> practical thing to do).
It need some testing. Last time I tried rsync with a 2GB file it didn't handle it very well, but that was several years ago. Since the file is append-only (apart from the first 4KB changing) in principle it should be possible to copy it incrementally fairly easily, writing a small tool to do so if necessary. > Yes, if you get this use-case, you might not only want a DB per user but > a DB per user per day. However at that point you will lose the ability to have a view which indexes all the customer's documents. That is, after 365 days you would need to do 365 queries just to locate a document by its metadata. I wouldn't recommend that. As others have suggested: you could if you prefer just store a pointer (e.g. filename) to where the file is stored on some other filesystem. Maybe store them by SHA1, where the filename is xx/xx/xxxxxxxxxxxxxxxx As an extension of this idea, you could have 257 couchdb filesystems: one for the indexes of metadata, and the others each storing 1/256th of the SHA1 space. Probably doesn't buy you much over just using the filesystem natively though. Regards, Brian.
