Hi Tito,

Can you explain where you're getting the "total count" from? Is this the total 
number of rows emitted by each view after all views have finished processing?

What do you mean by "Genesis case" - do you mean building a view for the first 
time?

Thanks,
Joan

----- Original Message -----
From: "Tito Ciuro" <tci...@mac.com>
To: user@couchdb.apache.org
Sent: Tuesday, October 28, 2014 1:32:37 PM
Subject: How does indexing really work?

Hello,

I’m a bit confused about how CouchDB really works. I just launched Futon and 
see that the indexer is busy working on a design document. I have almost a 
million documents.

A few minutes later, I see three more tasks appearing, all belonging to 
different design documents. No problem, except that the total count is all 
different:

- design doc 1: ~950,000
- design doc 2: ~450,000
- design doc 3: ~313,000
- design doc 4: ~85,000

Why are the total counts different? My understanding is/was that a database 
holds N documents. Each indexing function is passed a document which then gets 
compares whether it’s the doc_type it expects:

function(doc) {
    
<http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_2>if
 (doc.Type == "customer") {
    
<http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_3>emit(doc._id,
 {LastName: doc.LastName, FirstName: doc.FirstName});
    
<http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#CA-1846e35e0e66fe65e7a443a2459a0272833e6152_4>}
}

In the Genesis case, I was assuming that each view would have to go through 
each document across the database and index its own doc_type. Basically, one 
round for each design document for N total documents. For example, if the 
database contains 100,000 documents and two design documents, there would be 
two active tasks listed:

- _design/customers => index 100,000 documents
- _design/orders => index 100,000 documents

Later on, the indexing would be partial and the delta (say 9,000 docs) would 
have to be reindexed by each view:

- _design/customers => index 9,000 documents
- _design/orders => index 9,000 documents

This doesn’t seem to be the case. I’d love to know how indexing really works.

Thanks!

— Tito

Reply via email to