Brian Candler wrote:
[...]
Brian, thanks extremely for the caveats you provided.
I'd rather get into this with eyes wide open..
Sorry for asking this, but since I don't yet know who's who here..
1. The couchdb database is a single append-only file. Your filesystem needs
to support huge files.
Can someone else comment on the above ?
Does this really mean that if we have, for one customer, 100,000 CouchDB
"documents" consisting each of minimal meta-data, but with each an
attachment that is on average a 3 MB PDF file, that this is all stored
in one single 300 GB file ?
1a.
If yes, is that not uncomfortable/scary ?
(I mean, even nowadays, moving a 300 GB file is not the easiest
practical thing to do).
2a.
Is there anything that allows to control this ?
2. Once you get up to terabytes of documents, it may become impractical
and/or too slow to compact the database, which involves reading the entire
database from start to end and writing a completely new copy.
Agreed.
In your case it sounds like you normally just append documents and leave
them there forever.
Not 100%, but overwhelmingly so.
However, suppose you have a customer who leaves, and is
no longer paying you for the half terabyte of storage they are using?
Good practical observation. Due to our excellent service, that does not
happen very often of course. But we do have the occasional real-estate
broker or bank among our customers..
;-)
Or
another who, for legal reasons, requires a document to be purged? (Deleting
a document in couchdb just marks it as deleted; it can still be retrieved
until a compaction has been done.)
In fact this rather mimics our current system. Customers tend to not
remember when they delete a document themselves, and tend to accuse the
computers of losing it.
I would suggest that the easiest away round these problems - and also a good
way to improve security - is to have a separate couchdb database for each
customer.
That also mimics our current layout.
This still only requires running a single instance of the couchdb
server.
Good to know.
Brian, I really appreciate this information. I would probably have
found this out as I investigate CouchDB further, but it would have taken
me a lot more time.