Hi all,
*>>Apache Spark is definitely more suited for ML (iterative algorithms)
than*


*legacy Hadoop due to its preservation of state and optimized
executionstrategy (RDDs). However, their approaches are still in
synchronous iterativecommunication pattern.*
So, Hama has a better communication model. That is a good point.

*>>Moreover, BSP can have virtual **shared memory and many more benefits.*
I read somewhere that Spark has shared variables. BSP virtual shared memory
is something else or is it like shared variables in Spark ?

*>>In addition, another one convincing*

*point I think can  be a utilization ability of modern acceleration
accessoriessuch as InfiniBand and GPUs*
Yes, it is a good point but I found the following link. Apparently, Spark
is also capable of doing processing on GPU's.
https://spark-summit.org/east-2015/talk/heterospark-a-heterogeneous-cpugpu-spark-platform-for-deep-learning-algorithms-2

*>>I'm sure that this feature will bring a*

*completely new wave of big data. The problem we faced is only a lack
ofinterest to BSP programming model. :-)*
My knowledge is quite limited but I think you are right. With the rise of
IoT and stream processing, GPU's will become vital. Yes, I do not
understand that why BSP is not the programming model of choice now a days.
It has a strong theoretical background which was proposed decades back and
still MapReduce/Spark models are used.


*>>Just FYI, one of my friends said after reading this thread, "if
AmazonEC2 = MR or BSP, Google App Engine = Spark". Maybe usability side.*
I have not written a Spark job before, but I have seen the code. BSP looks
more intuitive to me somehow.

*>>Hama = GraphX (Library of Spark (Pregel model) [1])*
The graph module of Hama is definitely equal to GraphX of Spark.

Regards,
Behroz

On Thu, Jun 25, 2015 at 1:46 AM, Edward J. Yoon <[email protected]>
wrote:

> Hi, here's my few thoughts.
>
> Apache Spark is definitely more suited for ML (iterative algorithms) than
> legacy Hadoop due to its preservation of state and optimized execution
> strategy (RDDs). However, their approaches are still in synchronous
> iterative
> communication pattern.
>
> In Apache Hama case, it's a general-purpose pure BSP framework. While I
> admit
> that synchronization costs are high, the communication can be more
> efficiently
> realized with the message-passing BSP model. Moreover, BSP can have virtual
> shared memory and many more benefits. In addition, another one convincing
> point I think can  be a utilization ability of modern acceleration
> accessories
> such as InfiniBand and GPUs. I'm sure that this feature will bring a
> completely new wave of big data. The problem we faced is only a lack of
> interest to BSP programming model. :-)
>
> > 2) Do we have any recent benchmarks between the 2 systems ?
>
> It's in my todo list.
>
> --
> Best Regards, Edward J. Yoon
>
> -----Original Message-----
> From: Behroz Sikander [mailto:[email protected]]
> Sent: Thursday, June 25, 2015 12:57 AM
> To: [email protected]
> Subject: Hama vs Spark
>
> Hi,
> A few days back, I started reading about Apache Spark. It is a pretty good
> BigData platform. But a question arises to my mind that where Hama lies in
> comparison with Spark if we have to implement an iterative algorithm which
> is compute intensive (Machine learning or Optimization) ?
>
> I found some resources online but none answers my questions.
>
> 1)BSP vs MapReduce paper <http://arxiv.org/pdf/1203.2081v2.pdf>
> 2)
>
> https://people.apache.org/~edwardyoon/documents/Hama_BSP_for_Advanced_Analytics.pdf
> 3) I actually found the following benchmark but it is quite old.
>
>
> http://markmail.org/message/vyjsdpv355kua7rm#query:+page:1+mid:vstgda4fhmz52pdw+state:results
>
> Questions:
> 1) Is there any specific advantage when we chose BSP model instead of SPARK
> paradigm ?
> 2) Do we have any recent benchmarks between the 2 systems ?
> 3) What is the main convincing point to use Hama over Spark ?
> 4) Any scientific paper that compares both systems ? (I was not able to
> find any)
>
> Regards,
> Behroz Sikander
>
>
>

Reply via email to