Re: Expressing MapReduce with BSP

Thomas Jungblut Thu, 05 Jul 2012 22:13:08 -0700

Hi Apurv,

cool implementation. Also solves the problem the normal wordcount example
has by emitting every word with frequency 1 (large communication overhead
between map and reduce stage).
I would use the Guava MultiMap instead of the Java HashMap because it has
the cool count and auto increment feature.

Why the overhead of merging and sorting for yourself? You could use the
sorted message queue in Hama 0.5.0, this isn't disk based so you will not
have that scalability that you want to target but drastically reduce the
complexity of your code.
If you are working on it anyways, you could create a disk based sorted
queue which does this merging of the messages implicitly.

2012/7/5 Praveen Sripati <[email protected]>

> Apurv,
>
> Not sure of you have seen this paper or not,  but it concludes that
> effectively all MR jobs can be expressed as BSP jobs and other way. It also
> mentions when to go for BSP vs MR.
>
> http://arxiv.org/abs/1203.2081
>
> Thanks,
> Praveen
>
>
> On Thu, Jul 5, 2012 at 1:43 AM, Apurv Verma <[email protected]> wrote:
>
> > Hello,
> >  Here is a simplistic WordCount example I wrote with hama. There are a
> few
> > TODOs left but it works fine, Its fully scalable when all TODOs are
> > complete.
> >
> >
> >
> http://code.google.com/p/anahad/source/browse/trunk/src/main/java/org/anahata/bsp/WordCount.java
> >
> > Comments welcome :)
> >
> > --
> > thanks and regards,
> >
> > Apurv Verma
> > India
> >
>

Re: Expressing MapReduce with BSP

Reply via email to