In comparison level, I believe in the following: Hadoop = Spark Hama = GraphX (Library of Spark (Pregel model) [1])
Spark has subsystems (MLlib/GraphX and others) inside your ecosystem (like Hadoop). 1 - https://spark.apache.org/graphx/ - See in this page the comparable performance of GraphX, GraphLab and Giraph. 2015-06-24 22:09 GMT-03:00 Edward J. Yoon <[email protected]>: > Just FYI, one of my friends said after reading this thread, "if Amazon > EC2 = MR or BSP, Google App Engine = Spark". Maybe usability side. > > On Thu, Jun 25, 2015 at 8:46 AM, Edward J. Yoon <[email protected]> > wrote: > > Hi, here's my few thoughts. > > > > Apache Spark is definitely more suited for ML (iterative algorithms) than > > legacy Hadoop due to its preservation of state and optimized execution > > strategy (RDDs). However, their approaches are still in synchronous > iterative > > communication pattern. > > > > In Apache Hama case, it's a general-purpose pure BSP framework. While I > admit > > that synchronization costs are high, the communication can be more > efficiently > > realized with the message-passing BSP model. Moreover, BSP can have > virtual > > shared memory and many more benefits. In addition, another one convincing > > point I think can be a utilization ability of modern acceleration > accessories > > such as InfiniBand and GPUs. I'm sure that this feature will bring a > > completely new wave of big data. The problem we faced is only a lack of > > interest to BSP programming model. :-) > > > >> 2) Do we have any recent benchmarks between the 2 systems ? > > > > It's in my todo list. > > > > -- > > Best Regards, Edward J. Yoon > > > > -----Original Message----- > > From: Behroz Sikander [mailto:[email protected]] > > Sent: Thursday, June 25, 2015 12:57 AM > > To: [email protected] > > Subject: Hama vs Spark > > > > Hi, > > A few days back, I started reading about Apache Spark. It is a pretty > good > > BigData platform. But a question arises to my mind that where Hama lies > in > > comparison with Spark if we have to implement an iterative algorithm > which > > is compute intensive (Machine learning or Optimization) ? > > > > I found some resources online but none answers my questions. > > > > 1)BSP vs MapReduce paper <http://arxiv.org/pdf/1203.2081v2.pdf> > > 2) > > > https://people.apache.org/~edwardyoon/documents/Hama_BSP_for_Advanced_Analytics.pdf > > 3) I actually found the following benchmark but it is quite old. > > > > > http://markmail.org/message/vyjsdpv355kua7rm#query:+page:1+mid:vstgda4fhmz52pdw+state:results > > > > Questions: > > 1) Is there any specific advantage when we chose BSP model instead of > SPARK > > paradigm ? > > 2) Do we have any recent benchmarks between the 2 systems ? > > 3) What is the main convincing point to use Hama over Spark ? > > 4) Any scientific paper that compares both systems ? (I was not able to > > find any) > > > > Regards, > > Behroz Sikander > > > > > > > > -- > Best Regards, Edward J. Yoon >
