I'm not sure how it can be possible. However, I think user can find the slowest machine in each superstep and re-balance the loads. This can be handled from client (user) side.
On Sat, Aug 1, 2015 at 4:17 AM, Behroz Sikander <[email protected]> wrote: > +1. This is great. > > Btw our current implementation of Hama is Synchronous BSP i.e we have to > wait for the slowest machine to sync in order to move to the next super > step. Is there anything like Asynchronous BSP out yet ? If yes, do you have > plans to add it to this framework ? > > Regards, > Behroz > > On Wed, Jul 29, 2015 at 3:12 AM, Edward J. Yoon <[email protected]> > wrote: > >> I found research paper somewhat related with this topic. >> >> "Both the disk based method, i.e., MR, and the memory based method, >> i.e., BSP and Spark, need to load the data into main memory and >> conduct the expensive computation. However, when processing topk >> joins, BSP is clearly the best method as it is the only one that is >> able to perform top-k joins on large datasets. This is because BSP >> supports the frequent synchronizations between workers when performing >> the joining procedure, which quickly lowers the joining threshold for >> a given k. The winner between the MR and the Spark algorithms change >> from datasets to datasets: Spark is beaten by MR on A and B while >> beats MR on C." - >> http://www.ruizhang.info/publications/TPDS2015-Heads_Join.pdf >> >> On Thu, Jun 25, 2015 at 9:02 PM, Behroz Sikander <[email protected]> >> wrote: >> > Hi all, >> > *>>Apache Spark is definitely more suited for ML (iterative algorithms) >> > than* >> > >> > >> > *legacy Hadoop due to its preservation of state and optimized >> > executionstrategy (RDDs). However, their approaches are still in >> > synchronous iterativecommunication pattern.* >> > So, Hama has a better communication model. That is a good point. >> > >> > *>>Moreover, BSP can have virtual **shared memory and many more >> benefits.* >> > I read somewhere that Spark has shared variables. BSP virtual shared >> memory >> > is something else or is it like shared variables in Spark ? >> > >> > *>>In addition, another one convincing* >> > >> > *point I think can be a utilization ability of modern acceleration >> > accessoriessuch as InfiniBand and GPUs* >> > Yes, it is a good point but I found the following link. Apparently, Spark >> > is also capable of doing processing on GPU's. >> > >> https://spark-summit.org/east-2015/talk/heterospark-a-heterogeneous-cpugpu-spark-platform-for-deep-learning-algorithms-2 >> > >> > *>>I'm sure that this feature will bring a* >> > >> > *completely new wave of big data. The problem we faced is only a lack >> > ofinterest to BSP programming model. :-)* >> > My knowledge is quite limited but I think you are right. With the rise of >> > IoT and stream processing, GPU's will become vital. Yes, I do not >> > understand that why BSP is not the programming model of choice now a >> days. >> > It has a strong theoretical background which was proposed decades back >> and >> > still MapReduce/Spark models are used. >> > >> > >> > *>>Just FYI, one of my friends said after reading this thread, "if >> > AmazonEC2 = MR or BSP, Google App Engine = Spark". Maybe usability side.* >> > I have not written a Spark job before, but I have seen the code. BSP >> looks >> > more intuitive to me somehow. >> > >> > *>>Hama = GraphX (Library of Spark (Pregel model) [1])* >> > The graph module of Hama is definitely equal to GraphX of Spark. >> > >> > Regards, >> > Behroz >> > >> > On Thu, Jun 25, 2015 at 1:46 AM, Edward J. Yoon <[email protected] >> > >> > wrote: >> > >> >> Hi, here's my few thoughts. >> >> >> >> Apache Spark is definitely more suited for ML (iterative algorithms) >> than >> >> legacy Hadoop due to its preservation of state and optimized execution >> >> strategy (RDDs). However, their approaches are still in synchronous >> >> iterative >> >> communication pattern. >> >> >> >> In Apache Hama case, it's a general-purpose pure BSP framework. While I >> >> admit >> >> that synchronization costs are high, the communication can be more >> >> efficiently >> >> realized with the message-passing BSP model. Moreover, BSP can have >> virtual >> >> shared memory and many more benefits. In addition, another one >> convincing >> >> point I think can be a utilization ability of modern acceleration >> >> accessories >> >> such as InfiniBand and GPUs. I'm sure that this feature will bring a >> >> completely new wave of big data. The problem we faced is only a lack of >> >> interest to BSP programming model. :-) >> >> >> >> > 2) Do we have any recent benchmarks between the 2 systems ? >> >> >> >> It's in my todo list. >> >> >> >> -- >> >> Best Regards, Edward J. Yoon >> >> >> >> -----Original Message----- >> >> From: Behroz Sikander [mailto:[email protected]] >> >> Sent: Thursday, June 25, 2015 12:57 AM >> >> To: [email protected] >> >> Subject: Hama vs Spark >> >> >> >> Hi, >> >> A few days back, I started reading about Apache Spark. It is a pretty >> good >> >> BigData platform. But a question arises to my mind that where Hama lies >> in >> >> comparison with Spark if we have to implement an iterative algorithm >> which >> >> is compute intensive (Machine learning or Optimization) ? >> >> >> >> I found some resources online but none answers my questions. >> >> >> >> 1)BSP vs MapReduce paper <http://arxiv.org/pdf/1203.2081v2.pdf> >> >> 2) >> >> >> >> >> https://people.apache.org/~edwardyoon/documents/Hama_BSP_for_Advanced_Analytics.pdf >> >> 3) I actually found the following benchmark but it is quite old. >> >> >> >> >> >> >> http://markmail.org/message/vyjsdpv355kua7rm#query:+page:1+mid:vstgda4fhmz52pdw+state:results >> >> >> >> Questions: >> >> 1) Is there any specific advantage when we chose BSP model instead of >> SPARK >> >> paradigm ? >> >> 2) Do we have any recent benchmarks between the 2 systems ? >> >> 3) What is the main convincing point to use Hama over Spark ? >> >> 4) Any scientific paper that compares both systems ? (I was not able to >> >> find any) >> >> >> >> Regards, >> >> Behroz Sikander >> >> >> >> >> >> >> >> >> >> -- >> Best Regards, Edward J. Yoon >> -- Best Regards, Edward J. Yoon
