On 21 Jan 2009, at 05:05, Adam Wolff wrote:
After looking at this more, let me restate. I would totally get all
of this
if the signature of reduce was:reduce: function(key, values, rereduce)
What I don't get is: why does reduce get called with an arbitrarily
long
list of keys? I thought reduce was precisely for reducing all of the
mapped
inputs that are indexed under the *same* key. I think if I can get
that, the
rest will come clear.
See "Query processing" on
http://horicky.blogspot.com/2008/10/couchdb-implementation.html
Cheers
Jan
--
Thanks again,
A
On Tue, Jan 20, 2009 at 7:52 PM, Adam Wolff <[email protected]> wrote:
Thanks for the reply!
I'd seen all of this, though I re-read the wikipedia entry carefully.
Damien's blog entries don't appear to match the APIs in the version
I'm
running, which is 0.8.1
The wikipedia entry suggests that reduce is called only with values
that
match a single key. Using the log() function in CouchDB, I can see
that's
not the case for its reduce function -- it's called with multiple
different
keys, though it does appear that the input values are *ordered* by
matching
keys.
Anyway, I totally get how re-reduce (or "combine") works in
conventional
map/reduce, but I'm hazy on the details w/r/t to CouchDB. I'm
starting to
understand the answer to #1, but I'm really unclear on #2 (how/why
rereduce
is run.)
Thanks again,
A
On Tue, Jan 20, 2009 at 6:50 PM, Jeff Hinrichs - DM&T <[email protected]
>wrote:
On Tue, Jan 20, 2009 at 7:47 PM, Adam Wolff <[email protected]>
wrote:
Hi everyone,I'm really excited about CouchDB and I've started
playing
with
it. I get all of it, except for reduce, and especially re-reduce.
My first question is: how does CouchDB maintain all the separate
output
for
a given key from the map function? I mean: given a simple reduce
that
just
sums results, how does couch maintain separate results for each
possible
key/key range that can be given as input to that view?
My second question: when and why does rereduce get called? Is this
simply to
allow the server to chunk the processing, or is there semantic
meaning
to
it? I had assumed the former -- it's just a way of limiting the
size of
the
input to the reduce function -- but then this really confused me:
if I
log
each time my reduce function gets called, I see that the last
time it's
called, it's with rereduce=false. How is this possible? Don't all
the
results have to be funneled through rereduce to produce a single
result
value?
Any help here would be much appreciated. If there's a resource on
the
web I
should look at, please send it my way. Thanks!
A
Being that I just went through the learning process on reduce, I'll
point you here:
http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
"Reduce Functions"
As a good place to start.
Also, the mailing list, is an excellent resource.
http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%[email protected]%3e
along with:
http://en.wikipedia.org/wiki/MapReduce
http://labs.google.com/papers/mapreduce.html
and
http://damienkatz.net/2008/02/incremental_map.html
Regards,
Jeff