I think that giraph has a lot to offer here as well. Sent from my iPhone
On Oct 20, 2011, at 8:30, Josh Patterson <j...@cloudera.com> wrote: > I've run some tests with Spark in general, its a pretty interesting setup; > > I think the most interesting aspect (relevant to what you are asking > about) is that Matei already has Spark running on top of MRv2: > > https://github.com/mesos/spark-yarn > > (you dont have to run mesos, but the YARN code needs to be able to see > the jar in order to do its scheduling stuff) > > I've been playing around with writing a genetic algorithm in > Scala/Spark to run on MRv2, and in the process got introduced to the > book: > > "Parallel Iterative Algorithms, From Sequential to Grid Computing" > > which talks about strategies for parallelizing high iterative > algorithms and the inherent issues involved (sync/async iterations, > sync/async communications, etc). Since you can use Spark as a > "BSP-style" framework (ignoring the RRDs if you like) and just shoot > out slices of an array of items to be processed (relatively fast > compared to MR), it has some interesting property/tradeoffs to take a > look at. > > Toward the end of my ATL Hug talk I mentioned the possibility of how > MRv2 could be used with other frameworks, like Spark, to be better > suited for other algorithms (in this case, highly iterative): > > http://www.slideshare.net/jpatanooga/machine-learning-and-hadoop > > I think it would be interesting to have mahout sitting on top of MRv2, > like Ted is referring to, and then have an algorithm matched to a > framework on YARN and a workflow that mixed and matched these > combinations. > > Lot's of possibilities here. > > JP > > > On Wed, Oct 19, 2011 at 10:42 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: >> Spark is very cool but very incompatible with Hadoop code. Many Mahout >> algorithms would run much faster on Spark, but you will have to do the >> porting yourself. >> >> Let us know how it turns how! >> >> 2011/10/19 WangRamon <ramon_w...@hotmail.com> >> >>> >>> >>> >>> >>> Hi All I was told today that Spark is a much better platform for cluster >>> computing, better than Hadoop at least at Recommendation computing way, I'm >>> still very new at this area, if anyone has done some investigation on Spark, >>> can you please share your idea here, thank you very much. Thanks Ramon >>> >> > > > > -- > Twitter: @jpatanooga > Solution Architect @ Cloudera > hadoop: http://www.cloudera.com