Oh, I forgot to tell that you should change your partitioner to send all the
keys in the form of cat,* to the same reducer but it seems like Jeremy has
been much faster than me :)
-Jim
On Mon, Apr 13, 2009 at 5:24 PM, Jim Twensky wrote:
> I'm not sure if this is exactly what you want but, can y
I'm not sure if this is exactly what you want but, can you emit map records
as:
cat, doc5 -> 3
cat, doc1 -> 1
cat, doc5 -> 1
and so on..
This way, your reducers will get the intermediate key,value pairs as
cat, doc5 -> 3
cat, doc5 -> 1
cat, doc1 -> 1
then you can split the keys (cat, doc*)
I'm not familiar with setOutputValueGroupingComparator
what about adding the doc# in the key and have your own
hashing/Partitioner?
so doing something like
cat_doc5-> 1
cat_doc1-> 1
cat_doc5-> 3
the hashing method would take everything before "_" as the hash.
the shuffling would still put t