Re: Mahout on Spark?

Dmitriy Lyubimov Wed, 19 Feb 2014 00:47:22 -0800

PS I am moving along cost optimizer for spark-backed DRMs on some
multiplicative pipelines that is capable of figuring different cost-based
rewrites and R-Like DSL that mixes in-core and distributed matrix
representations and blocks but it is painfully slow, i really only doing it
like couple nights in a month. It does not look like i will be doing it on
company time any time soon (and even if i did, the company doesn't seem to
be inclined to contribute anything I do anything new on their time). It is
all painfully slow, there's no direct funding for it anywhere with no
string attached. That probably will be primary reason why Mahout would not
be able to get much traction compared to university-based contributions.



On Wed, Feb 19, 2014 at 12:27 AM, Dmitriy Lyubimov <dlie...@gmail.com>wrote:

> Unfortunately methinks the prospects of something like Mahout/MLLib merge
> seem very unlikely due to vastly diverged approach to the basics of linear
> algebra (and other things). Just like one cannot grow single tree out of
> two trunks -- not easily, anyway.
>
> It is fairly easy to port (and subsequently beat) MLib at this point from
> collection of algorithms point of view. But IMO goal should be more
> MLI-like first, and port second. And be very careful with concepts.
> Something that i so far don't see happening with MLib. MLib seems to be
> old-style Mahout-like rush to become a collection of basic algorithms
> rather than coherent foundation. Admittedly, i havent looked very closely.
>
>
> On Tue, Feb 18, 2014 at 11:41 PM, Sebastian Schelter <s...@apache.org>wrote:
>
>> I'm also convinced that Spark is a superior platform for executing
>> distributed ML algorithms. We've had a discussion about a change from
>> Hadoop to another platform some time ago, but at that point in time it was
>> not clear which of the upcoming dataflow processing systems (Spark,
>> Hyracks, Stratosphere) would establish itself amongst the users. To me it
>> seems pretty obvious that Spark made the race.
>>
>> I concur with Ted, it would be great to have the communities work
>> together. I know that at least 4 mahout committers (including me) are
>> already following Spark's mailinglist and actively participating in the
>> discussions.
>>
>> What are the ideas how a fruitful cooperation look like?
>>
>> Best,
>> Sebastian
>>
>> PS:
>>
>> I ported LLR-based cooccurrence analysis (aka item-based recommendation)
>> to Spark some time ago, but I haven't had time to test my code on a large
>> dataset yet. I'd be happy to see someone help with that.
>>
>>
>>
>>
>>
>>
>> On 02/19/2014 08:04 AM, Nick Pentreath wrote:
>>
>>> I know the Spark/Mllib devs can occasionally be quite set in ways of
>>> doing certain things, but we'd welcome as many Mahout devs as possible to
>>> work together.
>>>
>>>
>>> It may be too late, but perhaps a GSoC project to look at a port of some
>>> stuff like co occurrence recommender and streaming k-means?
>>>
>>>
>>>
>>>
>>> N
>>> --
>>> Sent from Mailbox for iPhone
>>>
>>> On Wed, Feb 19, 2014 at 3:02 AM, Ted Dunning <ted.dunn...@gmail.com>
>>> wrote:
>>>
>>>  On Tue, Feb 18, 2014 at 1:58 PM, Nick Pentreath <
>>>> nick.pentre...@gmail.com>wrote:
>>>>
>>>>> My (admittedly heavily biased) view is Spark is a superior platform
>>>>> overall
>>>>> for ML. If the two communities can work together to leverage the
>>>>> strengths
>>>>> of Spark, and the large amount of good stuff in Mahout (as well as the
>>>>> fantastic depth of experience of Mahout devs) I think a lot can be
>>>>> achieved!
>>>>>
>>>>>  It makes a lot of sense that Spark would be better than Hadoop for ML
>>>> purposes given that Hadoop was intended to do web-crawl kinds of things
>>>> and
>>>> Spark was intentionally built to support machine learning.
>>>> Given that Spark has been announced by a majority of the Hadoop-based
>>>> distribution vendors, it makes sense that maybe Mahout should jump in.
>>>> I really would prefer it if the two communities (MLib/MLI and Mahout)
>>>> could
>>>> work more closely together.  There is a lot of good to be had on both
>>>> sides.
>>>>
>>>
>>
>

Re: Mahout on Spark?

Reply via email to