Re: reduce/rereduce confusion

Paul Davis Tue, 20 Jan 2009 20:20:30 -0800

On Tue, Jan 20, 2009 at 11:05 PM, Adam Wolff <[email protected]> wrote:
> After looking at this more, let me restate. I would totally get all of this
> if the signature of reduce was:reduce: function(key, values, rereduce)
>
> What I don't get is: why does reduce get called with an arbitrarily long
> list of keys? I thought reduce was precisely for reducing all of the mapped
> inputs that are indexed under the *same* key. I think if I can get that, the
> rest will come clear.
>


This is a slight deviation in CouchDB's implementation. If you specify
the query parameter `group=true` you will get the behavior you expect.
By default CouchDB attempts to reduce all key/values to a single
reduce output. group=true gives the 'standard' one output per unique
key.

> Thanks again,
> A
>
> On Tue, Jan 20, 2009 at 7:52 PM, Adam Wolff <[email protected]> wrote:
>
>> Thanks for the reply!
>> I'd seen all of this, though I re-read the wikipedia entry carefully.
>> Damien's blog entries don't appear to match the APIs in the version I'm
>> running, which is 0.8.1
>> The wikipedia entry suggests that reduce is called only with values that
>> match a single key. Using the log() function in CouchDB, I can see that's
>> not the case for its reduce function -- it's called with multiple different
>> keys, though it does appear that the input values are *ordered* by matching
>> keys.
>>
>> Anyway, I totally get how re-reduce (or "combine") works in conventional
>> map/reduce, but I'm hazy on the details w/r/t to CouchDB. I'm starting to
>> understand the answer to #1, but I'm really unclear on #2 (how/why rereduce
>> is run.)
>>
>> Thanks again,
>> A
>>
>>
>> On Tue, Jan 20, 2009 at 6:50 PM, Jeff Hinrichs - DM&T 
>> <[email protected]>wrote:
>>
>>> On Tue, Jan 20, 2009 at 7:47 PM, Adam Wolff <[email protected]> wrote:
>>> > Hi everyone,I'm really excited about CouchDB and I've started playing
>>> with
>>> > it. I get all of it, except for reduce, and especially re-reduce.
>>> >
>>> > My first question is: how does CouchDB maintain all the separate output
>>> for
>>> > a given key from the map function? I mean: given a simple reduce that
>>> just
>>> > sums results, how does couch maintain separate results for each possible
>>> > key/key range that can be given as input to that view?
>>> >
>>> > My second question: when and why does rereduce get called? Is this
>>> simply to
>>> > allow the server to chunk the processing, or is there semantic meaning
>>> to
>>> > it? I had assumed the former -- it's just a way of limiting the size of
>>> the
>>> > input to the reduce function -- but then this really confused me: if I
>>> log
>>> > each time my reduce function gets called, I see that the last time it's
>>> > called, it's with rereduce=false. How is this possible? Don't all the
>>> > results have to be funneled through rereduce to produce a single result
>>> > value?
>>> >
>>> > Any help here would be much appreciated. If there's a resource on the
>>> web I
>>> > should look at, please send it my way. Thanks!
>>> >
>>> > A
>>> Being that I just went through the learning process on reduce, I'll
>>> point you here:
>>> http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
>>> "Reduce Functions"
>>>
>>> As a good place to start.
>>> Also, the mailing list, is an excellent resource.
>>>
>>> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%[email protected]%3e
>>>
>>> along with:
>>> http://en.wikipedia.org/wiki/MapReduce
>>> http://labs.google.com/papers/mapreduce.html
>>> and
>>> http://damienkatz.net/2008/02/incremental_map.html
>>>
>>> Regards,
>>>
>>> Jeff
>>>
>>
>>
>

Re: reduce/rereduce confusion

Reply via email to