Re: Mahout on Spark?

Dmitriy Lyubimov Wed, 19 Feb 2014 00:28:30 -0800

Unfortunately methinks the prospects of something like Mahout/MLLib merge
seem very unlikely due to vastly diverged approach to the basics of linear
algebra (and other things). Just like one cannot grow single tree out of
two trunks -- not easily, anyway.


It is fairly easy to port (and subsequently beat) MLib at this point from
collection of algorithms point of view. But IMO goal should be more
MLI-like first, and port second. And be very careful with concepts.
Something that i so far don't see happening with MLib. MLib seems to be
old-style Mahout-like rush to become a collection of basic algorithms
rather than coherent foundation. Admittedly, i havent looked very closely.


On Tue, Feb 18, 2014 at 11:41 PM, Sebastian Schelter <s...@apache.org> wrote:

> I'm also convinced that Spark is a superior platform for executing
> distributed ML algorithms. We've had a discussion about a change from
> Hadoop to another platform some time ago, but at that point in time it was
> not clear which of the upcoming dataflow processing systems (Spark,
> Hyracks, Stratosphere) would establish itself amongst the users. To me it
> seems pretty obvious that Spark made the race.
>
> I concur with Ted, it would be great to have the communities work
> together. I know that at least 4 mahout committers (including me) are
> already following Spark's mailinglist and actively participating in the
> discussions.
>
> What are the ideas how a fruitful cooperation look like?
>
> Best,
> Sebastian
>
> PS:
>
> I ported LLR-based cooccurrence analysis (aka item-based recommendation)
> to Spark some time ago, but I haven't had time to test my code on a large
> dataset yet. I'd be happy to see someone help with that.
>
>
>
>
>
>
> On 02/19/2014 08:04 AM, Nick Pentreath wrote:
>
>> I know the Spark/Mllib devs can occasionally be quite set in ways of
>> doing certain things, but we'd welcome as many Mahout devs as possible to
>> work together.
>>
>>
>> It may be too late, but perhaps a GSoC project to look at a port of some
>> stuff like co occurrence recommender and streaming k-means?
>>
>>
>>
>>
>> N
>> --
>> Sent from Mailbox for iPhone
>>
>> On Wed, Feb 19, 2014 at 3:02 AM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>
>>  On Tue, Feb 18, 2014 at 1:58 PM, Nick Pentreath <
>>> nick.pentre...@gmail.com>wrote:
>>>
>>>> My (admittedly heavily biased) view is Spark is a superior platform
>>>> overall
>>>> for ML. If the two communities can work together to leverage the
>>>> strengths
>>>> of Spark, and the large amount of good stuff in Mahout (as well as the
>>>> fantastic depth of experience of Mahout devs) I think a lot can be
>>>> achieved!
>>>>
>>>>  It makes a lot of sense that Spark would be better than Hadoop for ML
>>> purposes given that Hadoop was intended to do web-crawl kinds of things
>>> and
>>> Spark was intentionally built to support machine learning.
>>> Given that Spark has been announced by a majority of the Hadoop-based
>>> distribution vendors, it makes sense that maybe Mahout should jump in.
>>> I really would prefer it if the two communities (MLib/MLI and Mahout)
>>> could
>>> work more closely together.  There is a lot of good to be had on both
>>> sides.
>>>
>>
>

Re: Mahout on Spark?

Reply via email to