Unfortunately methinks the prospects of something like Mahout/MLLib merge seem very unlikely due to vastly diverged approach to the basics of linear algebra (and other things). Just like one cannot grow single tree out of two trunks -- not easily, anyway.
It is fairly easy to port (and subsequently beat) MLib at this point from collection of algorithms point of view. But IMO goal should be more MLI-like first, and port second. And be very careful with concepts. Something that i so far don't see happening with MLib. MLib seems to be old-style Mahout-like rush to become a collection of basic algorithms rather than coherent foundation. Admittedly, i havent looked very closely. On Tue, Feb 18, 2014 at 11:41 PM, Sebastian Schelter <s...@apache.org> wrote: > I'm also convinced that Spark is a superior platform for executing > distributed ML algorithms. We've had a discussion about a change from > Hadoop to another platform some time ago, but at that point in time it was > not clear which of the upcoming dataflow processing systems (Spark, > Hyracks, Stratosphere) would establish itself amongst the users. To me it > seems pretty obvious that Spark made the race. > > I concur with Ted, it would be great to have the communities work > together. I know that at least 4 mahout committers (including me) are > already following Spark's mailinglist and actively participating in the > discussions. > > What are the ideas how a fruitful cooperation look like? > > Best, > Sebastian > > PS: > > I ported LLR-based cooccurrence analysis (aka item-based recommendation) > to Spark some time ago, but I haven't had time to test my code on a large > dataset yet. I'd be happy to see someone help with that. > > > > > > > On 02/19/2014 08:04 AM, Nick Pentreath wrote: > >> I know the Spark/Mllib devs can occasionally be quite set in ways of >> doing certain things, but we'd welcome as many Mahout devs as possible to >> work together. >> >> >> It may be too late, but perhaps a GSoC project to look at a port of some >> stuff like co occurrence recommender and streaming k-means? >> >> >> >> >> N >> -- >> Sent from Mailbox for iPhone >> >> On Wed, Feb 19, 2014 at 3:02 AM, Ted Dunning <ted.dunn...@gmail.com> >> wrote: >> >> On Tue, Feb 18, 2014 at 1:58 PM, Nick Pentreath < >>> nick.pentre...@gmail.com>wrote: >>> >>>> My (admittedly heavily biased) view is Spark is a superior platform >>>> overall >>>> for ML. If the two communities can work together to leverage the >>>> strengths >>>> of Spark, and the large amount of good stuff in Mahout (as well as the >>>> fantastic depth of experience of Mahout devs) I think a lot can be >>>> achieved! >>>> >>>> It makes a lot of sense that Spark would be better than Hadoop for ML >>> purposes given that Hadoop was intended to do web-crawl kinds of things >>> and >>> Spark was intentionally built to support machine learning. >>> Given that Spark has been announced by a majority of the Hadoop-based >>> distribution vendors, it makes sense that maybe Mahout should jump in. >>> I really would prefer it if the two communities (MLib/MLI and Mahout) >>> could >>> work more closely together. There is a lot of good to be had on both >>> sides. >>> >> >