Re: reduce/rereduce confusion

Jan Lehnardt Wed, 21 Jan 2009 06:11:56 -0800


On 21 Jan 2009, at 05:05, Adam Wolff wrote:

After looking at this more, let me restate. I would totally get allof this
if the signature of reduce was:reduce: function(key, values, rereduce)
What I don't get is: why does reduce get called with an arbitrarilylonglist of keys? I thought reduce was precisely for reducing all of themappedinputs that are indexed under the *same* key. I think if I can getthat, the
rest will come clear.


See "Query processing" on 
http://horicky.blogspot.com/2008/10/couchdb-implementation.html

Cheers
Jan
--

Thanks again,
A

On Tue, Jan 20, 2009 at 7:52 PM, Adam Wolff <[email protected]> wrote:
Thanks for the reply!
I'd seen all of this, though I re-read the wikipedia entry carefully.
Damien's blog entries don't appear to match the APIs in the versionI'm
running, which is 0.8.1
The wikipedia entry suggests that reduce is called only with valuesthatmatch a single key. Using the log() function in CouchDB, I can seethat'snot the case for its reduce function -- it's called with multipledifferentkeys, though it does appear that the input values are *ordered* bymatching
keys.
Anyway, I totally get how re-reduce (or "combine") works inconventionalmap/reduce, but I'm hazy on the details w/r/t to CouchDB. I'mstarting tounderstand the answer to #1, but I'm really unclear on #2 (how/whyrereduce
is run.)

Thanks again,
A
On Tue, Jan 20, 2009 at 6:50 PM, Jeff Hinrichs - DM&T <[email protected]>wrote:
On Tue, Jan 20, 2009 at 7:47 PM, Adam Wolff <[email protected]>wrote:
Hi everyone,I'm really excited about CouchDB and I've startedplaying
with
it. I get all of it, except for reduce, and especially re-reduce.
My first question is: how does CouchDB maintain all the separateoutput
for
a given key from the map function? I mean: given a simple reducethat
just
sums results, how does couch maintain separate results for eachpossible
key/key range that can be given as input to that view?

My second question: when and why does rereduce get called? Is this
simply to
allow the server to chunk the processing, or is there semanticmeaning
to
it? I had assumed the former -- it's just a way of limiting thesize of
the
input to the reduce function -- but then this really confused me:if I
log
each time my reduce function gets called, I see that the lasttime it'scalled, it's with rereduce=false. How is this possible? Don't alltheresults have to be funneled through rereduce to produce a singleresult
value?
Any help here would be much appreciated. If there's a resource onthe
web I
should look at, please send it my way. Thanks!

A
Being that I just went through the learning process on reduce, I'll
point you here:
http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
"Reduce Functions"

As a good place to start.
Also, the mailing list, is an excellent resource.

http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%[email protected]%3e

along with:
http://en.wikipedia.org/wiki/MapReduce
http://labs.google.com/papers/mapreduce.html
and
http://damienkatz.net/2008/02/incremental_map.html

Regards,

Jeff

Re: reduce/rereduce confusion

Reply via email to