On Tue, Jan 20, 2009 at 11:05 PM, Adam Wolff <[email protected]> wrote: > After looking at this more, let me restate. I would totally get all of this > if the signature of reduce was:reduce: function(key, values, rereduce) > > What I don't get is: why does reduce get called with an arbitrarily long > list of keys? I thought reduce was precisely for reducing all of the mapped > inputs that are indexed under the *same* key. I think if I can get that, the > rest will come clear. >
This is a slight deviation in CouchDB's implementation. If you specify the query parameter `group=true` you will get the behavior you expect. By default CouchDB attempts to reduce all key/values to a single reduce output. group=true gives the 'standard' one output per unique key. > Thanks again, > A > > On Tue, Jan 20, 2009 at 7:52 PM, Adam Wolff <[email protected]> wrote: > >> Thanks for the reply! >> I'd seen all of this, though I re-read the wikipedia entry carefully. >> Damien's blog entries don't appear to match the APIs in the version I'm >> running, which is 0.8.1 >> The wikipedia entry suggests that reduce is called only with values that >> match a single key. Using the log() function in CouchDB, I can see that's >> not the case for its reduce function -- it's called with multiple different >> keys, though it does appear that the input values are *ordered* by matching >> keys. >> >> Anyway, I totally get how re-reduce (or "combine") works in conventional >> map/reduce, but I'm hazy on the details w/r/t to CouchDB. I'm starting to >> understand the answer to #1, but I'm really unclear on #2 (how/why rereduce >> is run.) >> >> Thanks again, >> A >> >> >> On Tue, Jan 20, 2009 at 6:50 PM, Jeff Hinrichs - DM&T >> <[email protected]>wrote: >> >>> On Tue, Jan 20, 2009 at 7:47 PM, Adam Wolff <[email protected]> wrote: >>> > Hi everyone,I'm really excited about CouchDB and I've started playing >>> with >>> > it. I get all of it, except for reduce, and especially re-reduce. >>> > >>> > My first question is: how does CouchDB maintain all the separate output >>> for >>> > a given key from the map function? I mean: given a simple reduce that >>> just >>> > sums results, how does couch maintain separate results for each possible >>> > key/key range that can be given as input to that view? >>> > >>> > My second question: when and why does rereduce get called? Is this >>> simply to >>> > allow the server to chunk the processing, or is there semantic meaning >>> to >>> > it? I had assumed the former -- it's just a way of limiting the size of >>> the >>> > input to the reduce function -- but then this really confused me: if I >>> log >>> > each time my reduce function gets called, I see that the last time it's >>> > called, it's with rereduce=false. How is this possible? Don't all the >>> > results have to be funneled through rereduce to produce a single result >>> > value? >>> > >>> > Any help here would be much appreciated. If there's a resource on the >>> web I >>> > should look at, please send it my way. Thanks! >>> > >>> > A >>> Being that I just went through the learning process on reduce, I'll >>> point you here: >>> http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views >>> "Reduce Functions" >>> >>> As a good place to start. >>> Also, the mailing list, is an excellent resource. >>> >>> http://mail-archives.apache.org/mod_mbox/couchdb-user/200901.mbox/%[email protected]%3e >>> >>> along with: >>> http://en.wikipedia.org/wiki/MapReduce >>> http://labs.google.com/papers/mapreduce.html >>> and >>> http://damienkatz.net/2008/02/incremental_map.html >>> >>> Regards, >>> >>> Jeff >>> >> >> >
