The difficulty with data transfer between tasks is handling synchronisation
and failure.
You may want to look at graph processing done on top of Hadoop (like
Giraph).
That's one way to do it but whether it is relevant or not to you will
depend on your context.
Regards
Bertrand
On Wed, Sep 26,
Yes, Giraph seems like the best way to go - it is mainly a vertex
evaluation with message passing between vertices. Synchronization is
handled for you.
On Wed, Sep 26, 2012 at 8:36 AM, Jane Wayne jane.wayne2...@gmail.comwrote:
hi,
i know that some algorithms cannot be parallelized and adapted
my problem is more general (than graph problems) and doesn't need to
have logic built around synchronization or failure. for example, when
a mapper is finished successfully, it just writes/persists to a
storage location (could be disk, could be database, could be memory,
etc...). when the next
Apache Giraph is a framework for graph processing, currently runs over
MR (but is getting its own coordination via YARN soon):
http://giraph.apache.org.
You may also checkout the generic BSP system (Giraph uses BSP too, if
am not wrong, but doesn't use Hama - works over MR instead), Apache
Hama:
The reason this is so rare is that the nature of map/reduce tasks is that
they are orthogonal i.e. the word count, batch image recognition, tera
sort -- all the things hadoop is famous for are largely orthogonal tasks.
Its much more rare (i think) to see people using hadoop to do traffic
I wouldn't so surprised. It takes times, energy and money to solve problems
and make solutions that would be prod-ready. A few people would consider
that the namenode/secondary spof is a limit for Hadoop itself in order to
go into a critical production environnement. (I am only quoting it and
Also read: http://arxiv.org/abs/1209.2191 ;-)
On Thu, Sep 27, 2012 at 12:24 AM, Bertrand Dechoux decho...@gmail.com wrote:
I wouldn't so surprised. It takes times, energy and money to solve problems
and make solutions that would be prod-ready. A few people would consider
that the
thanks. those issues pointed out do cover the pain points i'm experiencing.
On Wed, Sep 26, 2012 at 3:11 PM, Harsh J ha...@cloudera.com wrote:
Also read: http://arxiv.org/abs/1209.2191 ;-)
On Thu, Sep 27, 2012 at 12:24 AM, Bertrand Dechoux decho...@gmail.com wrote:
I wouldn't so surprised. It