Thanks Brian, that makes perfect sense.

- Tom

On 30-Mar-09, at 2:47 AM, Brian Candler wrote:

On Sat, Mar 28, 2009 at 07:38:24PM -0600, Tom McNulty wrote:
my map function produces output like:

[X, Y, 0]  -> Object_A
[X, Y, 1]  -> Object_B1
[X, Y, 1]  -> Object_B1
[X, Y, 1]  -> Object_B1
[Z, Q, 0] ....

Here I apply group_level=2, and use a ranged query ( [X, 0] to [X, [] ] )
since Y >= 0

Aside: you can use [X,null] to [X,{}] and then it doesn't matter about the
value of Y

Now during the reduce phase, I combine together Object_A's and
associated Object_B's. Can I assume that the first of the values sent to
'reduce' is Object_A?

I think not, because on a large database objects to be reduced will be sent to your reduce function in batches, and these batches will be broken up on B-tree boundaries, which may occur in arbitrary places. e.g. your reduce
function may receive

  [Object_A, Object_B1]

and then in a separate invocation

  [Object_B1, Object_B1]

Furthermore: due to reduce optimisations, you may only receive some of the
blocks to be reduced. Example: take these three Btree nodes:

    [a b c d e f g] [h i j k l m n] [o p q r s t u]
           R1              R2              R3

The reduce value of all the items in each Btree node is stored within each node, e.g. [a b c d e f g] reduces to R1. Now suppose someone asks for a
reduce value across a key range:

                     key range
             <----------------------------->
    [a b c d e f g] [h i j k l m n] [o p q r s t u]

As I understand it, CouchDB will call your reduce function to calculate a
value for [e f g] and for [o p q r], but will use the existing
stored/calculated value of R2 across the middle block.

Therefore, it is wrong to attempt to maintain any sort of state in your reduce function between invocations. And because the Btree node boundaries can appear in any place, it is wrong to attempt to cross-reference adjacent
documents too.

So I believe this sort of processing needs to take place in the client, not
in a reduce function.

Regards,

Brian.

Reply via email to