Just FYI, one of my friends said after reading this thread, "if Amazon EC2 = MR or BSP, Google App Engine = Spark". Maybe usability side.
On Thu, Jun 25, 2015 at 8:46 AM, Edward J. Yoon <[email protected]> wrote: > Hi, here's my few thoughts. > > Apache Spark is definitely more suited for ML (iterative algorithms) than > legacy Hadoop due to its preservation of state and optimized execution > strategy (RDDs). However, their approaches are still in synchronous iterative > communication pattern. > > In Apache Hama case, it's a general-purpose pure BSP framework. While I admit > that synchronization costs are high, the communication can be more efficiently > realized with the message-passing BSP model. Moreover, BSP can have virtual > shared memory and many more benefits. In addition, another one convincing > point I think can be a utilization ability of modern acceleration accessories > such as InfiniBand and GPUs. I'm sure that this feature will bring a > completely new wave of big data. The problem we faced is only a lack of > interest to BSP programming model. :-) > >> 2) Do we have any recent benchmarks between the 2 systems ? > > It's in my todo list. > > -- > Best Regards, Edward J. Yoon > > -----Original Message----- > From: Behroz Sikander [mailto:[email protected]] > Sent: Thursday, June 25, 2015 12:57 AM > To: [email protected] > Subject: Hama vs Spark > > Hi, > A few days back, I started reading about Apache Spark. It is a pretty good > BigData platform. But a question arises to my mind that where Hama lies in > comparison with Spark if we have to implement an iterative algorithm which > is compute intensive (Machine learning or Optimization) ? > > I found some resources online but none answers my questions. > > 1)BSP vs MapReduce paper <http://arxiv.org/pdf/1203.2081v2.pdf> > 2) > https://people.apache.org/~edwardyoon/documents/Hama_BSP_for_Advanced_Analytics.pdf > 3) I actually found the following benchmark but it is quite old. > > http://markmail.org/message/vyjsdpv355kua7rm#query:+page:1+mid:vstgda4fhmz52pdw+state:results > > Questions: > 1) Is there any specific advantage when we chose BSP model instead of SPARK > paradigm ? > 2) Do we have any recent benchmarks between the 2 systems ? > 3) What is the main convincing point to use Hama over Spark ? > 4) Any scientific paper that compares both systems ? (I was not able to > find any) > > Regards, > Behroz Sikander > > -- Best Regards, Edward J. Yoon
