Hi Apurv, cool implementation. Also solves the problem the normal wordcount example has by emitting every word with frequency 1 (large communication overhead between map and reduce stage). I would use the Guava MultiMap instead of the Java HashMap because it has the cool count and auto increment feature.
Why the overhead of merging and sorting for yourself? You could use the sorted message queue in Hama 0.5.0, this isn't disk based so you will not have that scalability that you want to target but drastically reduce the complexity of your code. If you are working on it anyways, you could create a disk based sorted queue which does this merging of the messages implicitly. 2012/7/5 Praveen Sripati <[email protected]> > Apurv, > > Not sure of you have seen this paper or not, but it concludes that > effectively all MR jobs can be expressed as BSP jobs and other way. It also > mentions when to go for BSP vs MR. > > http://arxiv.org/abs/1203.2081 > > Thanks, > Praveen > > > On Thu, Jul 5, 2012 at 1:43 AM, Apurv Verma <[email protected]> wrote: > > > Hello, > > Here is a simplistic WordCount example I wrote with hama. There are a > few > > TODOs left but it works fine, Its fully scalable when all TODOs are > > complete. > > > > > > > http://code.google.com/p/anahad/source/browse/trunk/src/main/java/org/anahata/bsp/WordCount.java > > > > Comments welcome :) > > > > -- > > thanks and regards, > > > > Apurv Verma > > India > > >
