Re: Grouping Values for Reducer Input

2009-04-13 Thread Jim Twensky
Oh, I forgot to tell that you should change your partitioner to send all the keys in the form of cat,* to the same reducer but it seems like Jeremy has been much faster than me :) -Jim On Mon, Apr 13, 2009 at 5:24 PM, Jim Twensky wrote: > I'm not sure if this is exactly what you want but, can y

Re: Grouping Values for Reducer Input

2009-04-13 Thread Jim Twensky
I'm not sure if this is exactly what you want but, can you emit map records as: cat, doc5 -> 3 cat, doc1 -> 1 cat, doc5 -> 1 and so on.. This way, your reducers will get the intermediate key,value pairs as cat, doc5 -> 3 cat, doc5 -> 1 cat, doc1 -> 1 then you can split the keys (cat, doc*)

RE: Grouping Values for Reducer Input

2009-04-13 Thread jeremy.huylebroeck
I'm not familiar with setOutputValueGroupingComparator what about adding the doc# in the key and have your own hashing/Partitioner? so doing something like cat_doc5-> 1 cat_doc1-> 1 cat_doc5-> 3 the hashing method would take everything before "_" as the hash. the shuffling would still put t