On 21 Jan 2009, at 05:05, Adam Wolff wrote:

After looking at this more, let me restate. I would totally get all of this
if the signature of reduce was:reduce: function(key, values, rereduce)

What I don't get is: why does reduce get called with an arbitrarily long list of keys? I thought reduce was precisely for reducing all of the mapped inputs that are indexed under the *same* key. I think if I can get that, the
rest will come clear.

See "Query processing" on 
http://horicky.blogspot.com/2008/10/couchdb-implementation.html

Cheers
Jan
--


Thanks again,
A

On Tue, Jan 20, 2009 at 7:52 PM, Adam Wolff <[email protected]> wrote:

Thanks for the reply!
I'd seen all of this, though I re-read the wikipedia entry carefully.
Damien's blog entries don't appear to match the APIs in the version I'm
running, which is 0.8.1
The wikipedia entry suggests that reduce is called only with values that match a single key. Using the log() function in CouchDB, I can see that's not the case for its reduce function -- it's called with multiple different keys, though it does appear that the input values are *ordered* by matching
keys.

Anyway, I totally get how re-reduce (or "combine") works in conventional map/reduce, but I'm hazy on the details w/r/t to CouchDB. I'm starting to understand the answer to #1, but I'm really unclear on #2 (how/why rereduce
is run.)

Thanks again,
A


On Tue, Jan 20, 2009 at 6:50 PM, Jeff Hinrichs - DM&T <[email protected] >wrote:

On Tue, Jan 20, 2009 at 7:47 PM, Adam Wolff <[email protected]> wrote:
Hi everyone,I'm really excited about CouchDB and I've started playing
with
it. I get all of it, except for reduce, and especially re-reduce.

My first question is: how does CouchDB maintain all the separate output
for
a given key from the map function? I mean: given a simple reduce that
just
sums results, how does couch maintain separate results for each possible
key/key range that can be given as input to that view?

My second question: when and why does rereduce get called? Is this
simply to
allow the server to chunk the processing, or is there semantic meaning
to
it? I had assumed the former -- it's just a way of limiting the size of
the
input to the reduce function -- but then this really confused me: if I
log
each time my reduce function gets called, I see that the last time it's called, it's with rereduce=false. How is this possible? Don't all the results have to be funneled through rereduce to produce a single result
value?

Any help here would be much appreciated. If there's a resource on the
web I
should look at, please send it my way. Thanks!

A
Being that I just went through the learning process on reduce, I'll
point you here:
http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
"Reduce Functions"

As a good place to start.
Also, the mailing list, is an excellent resource.

http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%[email protected]%3e

along with:
http://en.wikipedia.org/wiki/MapReduce
http://labs.google.com/papers/mapreduce.html
and
http://damienkatz.net/2008/02/incremental_map.html

Regards,

Jeff




Reply via email to