On 19 Apr 2009, at 08:51, André Warnier wrote:
Brian Candler wrote:
[...]
Brian, thanks extremely for the caveats you provided.
I'd rather get into this with eyes wide open..
Sorry for asking this, but since I don't yet know who's who here..
1. The couchdb database is a single append-only file. Your
filesystem needs
to support huge files.
Can someone else comment on the above ?
Does this really mean that if we have, for one customer, 100,000
CouchDB "documents" consisting each of minimal meta-data, but with
each an attachment that is on average a 3 MB PDF file, that this is
all stored in one single 300 GB file ?
1a.
If yes, is that not uncomfortable/scary ?
Yes, this is correct.
(I mean, even nowadays, moving a 300 GB file is not the easiest
practical thing to do).
Yes, if you get this use-case, you might not only want a DB per user
but a DB per user per day.
2a.
Is there anything that allows to control this ?
see above.
Cheers
Jan
--
2. Once you get up to terabytes of documents, it may become
impractical
and/or too slow to compact the database, which involves reading the
entire
database from start to end and writing a completely new copy.
Agreed.
In your case it sounds like you normally just append documents and
leave
them there forever.
Not 100%, but overwhelmingly so.
However, suppose you have a customer who leaves, and is
no longer paying you for the half terabyte of storage they are using?
Good practical observation. Due to our excellent service, that does
not happen very often of course. But we do have the occasional real-
estate broker or bank among our customers..
;-)
Or
another who, for legal reasons, requires a document to be purged?
(Deleting
a document in couchdb just marks it as deleted; it can still be
retrieved
until a compaction has been done.)
In fact this rather mimics our current system. Customers tend to
not remember when they delete a document themselves, and tend to
accuse the computers of losing it.
I would suggest that the easiest away round these problems - and
also a good
way to improve security - is to have a separate couchdb database
for each
customer.
That also mimics our current layout.
This still only requires running a single instance of the couchdb
server.
Good to know.
Brian, I really appreciate this information. I would probably have
found this out as I investigate CouchDB further, but it would have
taken me a lot more time.