Re: Has anyone tried Spark with Mahout?

Ted Dunning Thu, 20 Oct 2011 10:45:58 -0700

I think that giraph has a lot to offer here as well. 

Sent from my iPhone


On Oct 20, 2011, at 8:30, Josh Patterson <j...@cloudera.com> wrote:

> I've run some tests with Spark in general, its a pretty interesting setup;
> 
> I think the most interesting aspect (relevant to what you are asking
> about) is that Matei already has Spark running on top of MRv2:
> 
> https://github.com/mesos/spark-yarn
> 
> (you dont have to run mesos, but the YARN code needs to be able to see
> the jar in order to do its scheduling stuff)
> 
> I've been playing around with writing a genetic algorithm in
> Scala/Spark to run on MRv2, and in the process got introduced to the
> book:
> 
> "Parallel Iterative Algorithms, From Sequential to Grid Computing"
> 
> which talks about strategies for parallelizing high iterative
> algorithms and the inherent issues involved (sync/async iterations,
> sync/async communications, etc). Since you can use Spark as a
> "BSP-style" framework (ignoring the RRDs if you like) and just shoot
> out slices of an array of items to be processed (relatively fast
> compared to MR), it has some interesting property/tradeoffs to take a
> look at.
> 
> Toward the end of my ATL Hug talk I mentioned the possibility of how
> MRv2 could be used with other frameworks, like Spark, to be better
> suited for other algorithms (in this case, highly iterative):
> 
> http://www.slideshare.net/jpatanooga/machine-learning-and-hadoop
> 
> I think it would be interesting to have mahout sitting on top of MRv2,
> like Ted is referring to, and then have an algorithm matched to a
> framework on YARN and a workflow that mixed and matched these
> combinations.
> 
> Lot's of possibilities here.
> 
> JP
> 
> 
> On Wed, Oct 19, 2011 at 10:42 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>> Spark is very cool but very incompatible with Hadoop code.  Many Mahout
>> algorithms would run much faster on Spark, but you will have to do the
>> porting yourself.
>> 
>> Let us know how it turns how!
>> 
>> 2011/10/19 WangRamon <ramon_w...@hotmail.com>
>> 
>>> 
>>> 
>>> 
>>> 
>>> Hi All I was told today that Spark is a much better platform for cluster
>>> computing, better than Hadoop at least at Recommendation computing way, I'm
>>> still very new at this area, if anyone has done some investigation on Spark,
>>> can you please share your idea here, thank you very much. Thanks Ramon
>>> 
>> 
> 
> 
> 
> -- 
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com

Re: Has anyone tried Spark with Mahout?

Reply via email to