Re: Hama vs Spark

Júlio Pires Wed, 24 Jun 2015 18:33:14 -0700

In comparison level, I believe in the following:

Hadoop = Spark
Hama = GraphX (Library of Spark (Pregel model) [1])


Spark has subsystems (MLlib/GraphX and others) inside your ecosystem (like
Hadoop).

1 - https://spark.apache.org/graphx/
- See in this page the comparable performance of GraphX, GraphLab and
Giraph.


2015-06-24 22:09 GMT-03:00 Edward J. Yoon <[email protected]>:

> Just FYI, one of my friends said after reading this thread, "if Amazon
> EC2 = MR or BSP, Google App Engine = Spark". Maybe usability side.
>
> On Thu, Jun 25, 2015 at 8:46 AM, Edward J. Yoon <[email protected]>
> wrote:
> > Hi, here's my few thoughts.
> >
> > Apache Spark is definitely more suited for ML (iterative algorithms) than
> > legacy Hadoop due to its preservation of state and optimized execution
> > strategy (RDDs). However, their approaches are still in synchronous
> iterative
> > communication pattern.
> >
> > In Apache Hama case, it's a general-purpose pure BSP framework. While I
> admit
> > that synchronization costs are high, the communication can be more
> efficiently
> > realized with the message-passing BSP model. Moreover, BSP can have
> virtual
> > shared memory and many more benefits. In addition, another one convincing
> > point I think can  be a utilization ability of modern acceleration
> accessories
> > such as InfiniBand and GPUs. I'm sure that this feature will bring a
> > completely new wave of big data. The problem we faced is only a lack of
> > interest to BSP programming model. :-)
> >
> >> 2) Do we have any recent benchmarks between the 2 systems ?
> >
> > It's in my todo list.
> >
> > --
> > Best Regards, Edward J. Yoon
> >
> > -----Original Message-----
> > From: Behroz Sikander [mailto:[email protected]]
> > Sent: Thursday, June 25, 2015 12:57 AM
> > To: [email protected]
> > Subject: Hama vs Spark
> >
> > Hi,
> > A few days back, I started reading about Apache Spark. It is a pretty
> good
> > BigData platform. But a question arises to my mind that where Hama lies
> in
> > comparison with Spark if we have to implement an iterative algorithm
> which
> > is compute intensive (Machine learning or Optimization) ?
> >
> > I found some resources online but none answers my questions.
> >
> > 1)BSP vs MapReduce paper <http://arxiv.org/pdf/1203.2081v2.pdf>
> > 2)
> >
> https://people.apache.org/~edwardyoon/documents/Hama_BSP_for_Advanced_Analytics.pdf
> > 3) I actually found the following benchmark but it is quite old.
> >
> >
> http://markmail.org/message/vyjsdpv355kua7rm#query:+page:1+mid:vstgda4fhmz52pdw+state:results
> >
> > Questions:
> > 1) Is there any specific advantage when we chose BSP model instead of
> SPARK
> > paradigm ?
> > 2) Do we have any recent benchmarks between the 2 systems ?
> > 3) What is the main convincing point to use Hama over Spark ?
> > 4) Any scientific paper that compares both systems ? (I was not able to
> > find any)
> >
> > Regards,
> > Behroz Sikander
> >
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Re: Hama vs Spark

Reply via email to