I mapped out a set of web 2.0 level questions:
>>> a) How many of the update headers are actually useful?  Is it just the
>>> last successfully written one or even just a few last ones ?

Jason Smith tried reducing them to my 0.2 level:
>> Excellent point. That's just it, isn't it?
>> How useful are the lower rungs of a ladder? How useful is the food you
>> ate a year ago?

Jens Alfke tried re-reduced them even much lower:
> How useful was the accident insurance policy you paid for last year and then
> didn’t have an accident? :-)
> The update header is largely there for disaster recovery.

But neither one even bothered trying to answer my question of whether
just the last updated header or perhaps the last few are ever used.

> The update header is largely there for disaster recovery.

I was also under an impression that the update headers (pointers to
the root of btree) are also somehow being used for reading
consistency.  If so then this might suggest that a database could be
rolled back to some previous point in time.  How far back and how
practical is another question.

In any case, this whole thread is about me trying to understand what
is happening under the hood - strictly in trying to understand for
what CouchDB can best be used.  However, since we are being a bit
flippant about it, let me give you my 0.2 impression of CouchDB.

On the surface, that RESTful API is so darn appealing and mostly
because many portals have gone back and added such RESTful API's to
whatever gunk they had beforehand.  Youtube is an excellent example.
So, the most appealing thing to me is that by using CouchDB, a
publicly exposable API to my own application ends up already being
baked in by CouchDB itself, without me having to do any additional
work. It's a whole another kettle of fish, whether I actually need to
expose such an API or even worse - what if it's detrimental to me to
do so and whether it is even possible not to.

However, much of CouchDB is like an onion, where you can not be all
that certain what you will end up getting underneath, until you have
carefully peeled off a layer or two and tried it yourself.  Case in
point is not just the issue of disk usage but number of concurrently
open files that can be into tens or hundreds of thousands. There is
just something about numbers like that, that does not feel natural -
mostly because it raises the question of how the heck did relational
databases ever manage to do with merely dozens of tables within a
single database.

Then there is the whole issue of servers (where much of this makes
sense because of scalability) vs clients (where very little of this
makes any sense because there are no multi-user concurrency issues).
On top of that, when the same issues are raised with regard to mobile
devices, the prevalent answer is "oh, those clients only need a much
smaller subset".  So, when it comes to CouchDB on client devices, one
is left with an impression that it is not really a database for
storing hundreds nor even tens of thousands of records but mere dozens
or few hundreds of key/value pairs.  Never mind the fact that good
number of apps need SQLite-like querying capabilities - right down to
full text searching and table joining.

But when all is said and done, while we are discussing what seem to be
quite large disk usage inefficiencies, I can not help but keep
thinking about those wonderful attachments capabilities.  What exactly
are they for, if not for attaching "things" that themselves can be of
appreciable size and especially if we consider multimedia and all.
So, what happens when one of those attachment(s) change?  Do the old
revs with the old attachments become dead and a new rev is written out
with the new attachment(s).  Does all of this not magnify the overall
disk "wastage" problem by the sheer size of such attachment(s).

Sincerely,
teslan

Reply via email to