Hi, here's my few thoughts.

Apache Spark is definitely more suited for ML (iterative algorithms) than 
legacy Hadoop due to its preservation of state and optimized execution 
strategy (RDDs). However, their approaches are still in synchronous iterative 
communication pattern.

In Apache Hama case, it's a general-purpose pure BSP framework. While I admit 
that synchronization costs are high, the communication can be more efficiently 
realized with the message-passing BSP model. Moreover, BSP can have virtual 
shared memory and many more benefits. In addition, another one convincing 
point I think can  be a utilization ability of modern acceleration accessories 
such as InfiniBand and GPUs. I'm sure that this feature will bring a 
completely new wave of big data. The problem we faced is only a lack of 
interest to BSP programming model. :-)

> 2) Do we have any recent benchmarks between the 2 systems ?

It's in my todo list.

--
Best Regards, Edward J. Yoon

-----Original Message-----
From: Behroz Sikander [mailto:[email protected]]
Sent: Thursday, June 25, 2015 12:57 AM
To: [email protected]
Subject: Hama vs Spark

Hi,
A few days back, I started reading about Apache Spark. It is a pretty good
BigData platform. But a question arises to my mind that where Hama lies in
comparison with Spark if we have to implement an iterative algorithm which
is compute intensive (Machine learning or Optimization) ?

I found some resources online but none answers my questions.

1)BSP vs MapReduce paper <http://arxiv.org/pdf/1203.2081v2.pdf>
2)
https://people.apache.org/~edwardyoon/documents/Hama_BSP_for_Advanced_Analytics.pdf
3) I actually found the following benchmark but it is quite old.

http://markmail.org/message/vyjsdpv355kua7rm#query:+page:1+mid:vstgda4fhmz52pdw+state:results

Questions:
1) Is there any specific advantage when we chose BSP model instead of SPARK
paradigm ?
2) Do we have any recent benchmarks between the 2 systems ?
3) What is the main convincing point to use Hama over Spark ?
4) Any scientific paper that compares both systems ? (I was not able to
find any)

Regards,
Behroz Sikander


Reply via email to