OK. Since this is already work in progress by Apurv and it's not a
high-priority
by the Hama team, I will not pursue it any further.
Leonidas
On Oct 11, 2012, at 12:57 PM, Thomas Jungblut wrote:
Thanks you two for bringing up that discussion.
Personally I have a very strong opinion on that, I think that
building a
MapReduce solution on top of BSP is useless.
We had nearly ten years of development in this paradigm and it has
grown
and specialized itself very much.
You can express MapReduce in BSP, that's totally fine. But that does
not
mean that every MapReduce algorithm is automagically efficient on BSP.
There was (and still is) lots of development on the MapReduce engine
and
you can't cope with that on a more abstract paradigm.
But, of course there are things where MapReduce is inefficient
(iterative
jobs, grouping, no explicit output caching).
Yeah grouping, actually grouping is the main part of reducing, but
it is
solved inefficiently in Hadoop.
You are forced to sort and that's (when I recall your paper
correctly) also
a drawback which lead you to implement mrql with BSP, because
grouping by
hash is for several cases much more faster and sometimes also more
efficient.
It's funny because the original paper [1] suggested that they just
have
sort as a nice feature to build an inverted index and to do binary
search
on the tokens. So it's more of a nice side-effect than the real
design of
the system.
All in all, it does not mean that I am not interested in providing
such
functionality in Hama, but I'm sure that we should invest our time
more
carefully on features that bring value to the users (improving message
scalability, improve performance, provide more examples and
algorithms, do
talks and presentations) than coding a half baked solution that is
easily
outperformed by the normal MapReduce.
It was never my intention to "kill" Hadoop by developing with Hama,
but to
improve certain use cases that can not be done efficiently in
MapReduce.
So if it's just 1k lines and it is not a half-baked solution, feel
free to
contribute your stuff.
[1] http://research.google.com/archive/mapreduce.html