I'd be very interested to know the performance impact of that optimization as well. What is the overhead or bottleneck with large view values? Estimating 100 bytes per key/value pair within each of the million documents, that's 2GB of raw data, which should write to a laptop disk within 2 minutes.

I'm wondering whether it matters how large the view values are, since they would seem not to be involved in the view processing very much--only written to disk in the order defined by the keys.

Of course, that goes against the common wisdom that the fastest thing to do is emit(key, null); but that could impact the application significantly since you have to query again for the documents. (I'm unsure whether include_docs has a performance penalty either.)

I guess what I'm asking is, why does the value side of views impact performance so greatly?

kowsik wrote:
I would highly recommend that you do emit(doc.field, null) so that the
key space doesn't get unwieldy and large. Since the id of the document
is part of the map results, you can always fetch it using
include_docs=true.

K.

On Wed, Apr 1, 2009 at 10:12 AM, Manjunath Somashekhar
<[email protected]> wrote:
hi All,

We have been using couchdb (built out of trunk) for prototyping an idea and 
would like to thank and congratulate you folks for a simple and usable schema 
free db.

We plan to store few million documents in couchdb and we would like to create 
couple of views to fetch the data appropriately. We have inserted a million 
documents (each containing about 20 fields). We are indexing/creating a view on 
a particular field of the document. The map function of the view is simple 
straight forward emit (emit(doc.field, doc)). It takes about 90 mins to build 
the required B-Tree index the first time. All the subsequent queries are 
performing extremely well (milli second responses). Can anything be done to 
reduce the 90 mins taken to build the required B-Tree index the first time?

Environment details:
Couchdb - 0.9.0a757326
Erlang - 5.6.5
Linux kernel - 2.6.24-23-generic #1 SMP Mon Jan 26 00:13:11 UTC 2009 i686 
GNU/Linux
Ubuntu distribution
Centrino Dual core, 4GB RAM laptop

Thanks
Manju





--
Jason Smith
Proven Corporation
Bangkok, Thailand
http://www.proven-corporation.com

Reply via email to