Thx Brian very much for the quickly reply, I should say my description isn't very clealy, there is some complex business logic need to be impl within reduce.
I descripe the application scenario carefully: when user learn from one dialog, they start a session( sessionid), the study on every line in dialog generate a couchdb document(there are uid/dialogid/sessionid, wordcount/weightedScore/grade for the line), the user could re-study the same dialog some days later, so they start a new session but for the same dialog, we want get every user's average grade from their study results(dialog as unit, so we need sum for specified session) but for the same dialog we only want to use the highest grade of session not use all session this seem to difficult to impl with one view, as impl in rdbms we need build sql query on a subquery(or on a db view), is that proper to impl with couchdb's view? you are right brian, wordcount/weightedScore can be simple summed, average grade = weightedScore / wordcount, I paster my previous reduce function code bellow(code maybe already complex), by the way when you said "root node with *all* the uids", i think i don't very clearly about the view's internal store structure and i can't find in wiki: function(keys, values, rereduce) { var wordCount = 0; var weightedScore = 0; if( !rereduce ) { // This is the reduce phase, we are reducing over emitted values from the map functions. var sessions = {}; for(var k in keys){ //caculate the total value for every session(contain multi sessiondialog<=>couchdb document) var key = keys[k][0]; key = key?key.join('_'):key; if (!sessions[key]) { sessions[key] = values[k]; }else{ sessions[key].wordCount += values[k].wordCount; sessions[key].weightedScore += values[k].weightedScore; sessions[key].grade = sessions[key].weightedScore/sessions[key].wordCount; } } //caculate the top session for each dialog var dialogsessions = {}; for(var sk in sessions){ var dialogId = sk?sk.split('_')[1]:sk; if(!dialogsessions[dialogId]){ dialogsessions[dialogId] = sessions[sk]; }else if(dialogsessions[dialogId].grade < sessions[sk].grade){ dialogsessions[dialogId] = sessions[sk]; } } //caculate the result for(var ds in dialogsessions){ wordCount += dialogsessions[ds].wordCount; weightedScore += dialogsessions[ds].weightedScore; } } else { // This is the rereduce phase, we are re-reducing previosuly reduced values. for(var i in values) { wordCount += values[i].wordCount; weightedScore += values[i].weightedScore; } } return {"wordCount" : wordCount, "weightedScore" : weightedScore, "grade" : weightedScore/wordCount }; } On Wed, Jun 24, 2009 at 11:43 PM, Brian Candler <b.cand...@pobox.com> wrote: > On Wed, Jun 24, 2009 at 06:35:56PM +0800, hhsuper wrote: > > map function emit structure(key cols refer to uid/dialogid/sessionid): > > emit( ["86", "10380", "4172"], {wordCount: 20, weightedScore: 1380, > > grade: 69}) > > reduce function return: {wordCount: 20, weightedScore: 1380, grade: > 69} > > the reduce function's logic: first caculate the sum value for every > > unique uid_dialogid_sessionid key, then get the max value for every > > unique uid_dialogid key, at last sum the values for the key uid, these > > caculate on wordCount/weightedScore/grade > > Code would probably speak clearer than words here. Since I don't understand > your algorithm from that description, I can only talk in generalities. > > Assuming that you have some uid and some calculated values against that uid > (and the same uid appears in multiple documents), then one option would be > a > reduce function which emits > > { > uid1: {wordCount: 20, weightedScore: 1380, grade: 69}, > uid2: {...etc} > } > > Then the rereduce function performs the same logic for all the uids seen in > the input. However the output of such a reduce function will grow without > bounds, and the root node will include the information for *all* the uids. > This is not good. > > A better reduce function would output null if it has multiple uids in its > input. If it sees only a single uid across all its inputs, it can output > > {uid: 1234, wordCount: 20, weightedScore: 1380, grade: 69} > > Then the re-reduce function would do the same: if all its inputs have the > same uid then it calculates the relevant values, otherwise outputs null. > This obviously reduces to null, except when you do a query where the key > range covers documents with only one uid (or you group by uid), in which > case you'll get the info you're looking for. > > All this depends on the logic by which wordCount, weightedScore and grade > from multiple documents may be combined, and whether the intermediate > results can also be combined. I mean, I imagine the wordCount's can simply > be summed, but can the other values be combined similarly? > > But in any case: reduce functions are not suitable for all purposes. If you > can't get the answer you need from a reduce function, then you need to > perform the calculation client-side. Sorry, that's how it is. > > Regards, > > Brian. > -- Yours sincerely Jack Su