I've run some tests with Spark in general, its a pretty interesting setup;

I think the most interesting aspect (relevant to what you are asking
about) is that Matei already has Spark running on top of MRv2:

https://github.com/mesos/spark-yarn

(you dont have to run mesos, but the YARN code needs to be able to see
the jar in order to do its scheduling stuff)

I've been playing around with writing a genetic algorithm in
Scala/Spark to run on MRv2, and in the process got introduced to the
book:

"Parallel Iterative Algorithms, From Sequential to Grid Computing"

which talks about strategies for parallelizing high iterative
algorithms and the inherent issues involved (sync/async iterations,
sync/async communications, etc). Since you can use Spark as a
"BSP-style" framework (ignoring the RRDs if you like) and just shoot
out slices of an array of items to be processed (relatively fast
compared to MR), it has some interesting property/tradeoffs to take a
look at.

Toward the end of my ATL Hug talk I mentioned the possibility of how
MRv2 could be used with other frameworks, like Spark, to be better
suited for other algorithms (in this case, highly iterative):

http://www.slideshare.net/jpatanooga/machine-learning-and-hadoop

I think it would be interesting to have mahout sitting on top of MRv2,
like Ted is referring to, and then have an algorithm matched to a
framework on YARN and a workflow that mixed and matched these
combinations.

Lot's of possibilities here.

JP


On Wed, Oct 19, 2011 at 10:42 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Spark is very cool but very incompatible with Hadoop code.  Many Mahout
> algorithms would run much faster on Spark, but you will have to do the
> porting yourself.
>
> Let us know how it turns how!
>
> 2011/10/19 WangRamon <ramon_w...@hotmail.com>
>
>>
>>
>>
>>
>> Hi All I was told today that Spark is a much better platform for cluster
>> computing, better than Hadoop at least at Recommendation computing way, I'm
>> still very new at this area, if anyone has done some investigation on Spark,
>> can you please share your idea here, thank you very much. Thanks Ramon
>>
>



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Reply via email to