On 30 Mar 2009, at 19:34, Adam Kocoloski wrote:
Wow, very nice exposition. Cheers,
Yeah good job Brian, this is almost worth to put into the wiki (well,
not even almost...)!
Cheers
Jan
--
Adam
On Mar 30, 2009, at 4:47 AM, Brian Candler wrote:
On Sat, Mar 28, 2009 at 07:38:24PM -0600, Tom McNulty wrote:
my map function produces output like:
[X, Y, 0] -> Object_A
[X, Y, 1] -> Object_B1
[X, Y, 1] -> Object_B1
[X, Y, 1] -> Object_B1
[Z, Q, 0] ....
Here I apply group_level=2, and use a ranged query ( [X, 0] to [X,
[] ] )
since Y >= 0
Aside: you can use [X,null] to [X,{}] and then it doesn't matter
about the
value of Y
Now during the reduce phase, I combine together Object_A's and
associated Object_B's. Can I assume that the first of the values
sent to
'reduce' is Object_A?
I think not, because on a large database objects to be reduced will
be sent
to your reduce function in batches, and these batches will be
broken up on
B-tree boundaries, which may occur in arbitrary places. e.g. your
reduce
function may receive
[Object_A, Object_B1]
and then in a separate invocation
[Object_B1, Object_B1]
Furthermore: due to reduce optimisations, you may only receive some
of the
blocks to be reduced. Example: take these three Btree nodes:
[a b c d e f g] [h i j k l m n] [o p q r s t u]
R1 R2 R3
The reduce value of all the items in each Btree node is stored
within each
node, e.g. [a b c d e f g] reduces to R1. Now suppose someone asks
for a
reduce value across a key range:
key range
<----------------------------->
[a b c d e f g] [h i j k l m n] [o p q r s t u]
As I understand it, CouchDB will call your reduce function to
calculate a
value for [e f g] and for [o p q r], but will use the existing
stored/calculated value of R2 across the middle block.
Therefore, it is wrong to attempt to maintain any sort of state in
your
reduce function between invocations. And because the Btree node
boundaries
can appear in any place, it is wrong to attempt to cross-reference
adjacent
documents too.
So I believe this sort of processing needs to take place in the
client, not
in a reduce function.
Regards,
Brian.