RE: Mahout on Spark

2014-03-26 Thread Saikat Kanjilal
s for you around these > Date: Wed, 26 Mar 2014 10:31:38 -0700 > Subject: Re: Mahout on Spark > From: dlie...@gmail.com > To: dev@mahout.apache.org > > No, we probably don't want to create them unless we have someone to assign > them to. You are more than welcome create on

Re: Mahout on Spark

2014-03-26 Thread Dmitriy Lyubimov
es? I'd like to > volunteer to take on the shell and the R bindings , should I create JIRA > items for these? > > > Date: Wed, 26 Mar 2014 10:12:01 -0700 > > Subject: Re: Mahout on Spark > > From: dlie...@gmail.com > > To: sxk1...@hotmail.com > > CC: dev@m

RE: Mahout on Spark

2014-03-26 Thread Saikat Kanjilal
@DmitryAre there JIRA items created for the wanted pieces? I'd like to volunteer to take on the shell and the R bindings , should I create JIRA items for these? > Date: Wed, 26 Mar 2014 10:12:01 -0700 > Subject: Re: Mahout on Spark > From: dlie...@gmail.com > To: sxk1...@hotm

Re: Mahout on Spark

2014-03-26 Thread Dmitriy Lyubimov
Sure. @Saikat et al: Check out the http://mahout.apache.org/users/sparkbindings/home.html "Wanted" section. Of course, data frames and vectorization(feature prep) standardization is very high priority there. Another high priority is interactive shell /scripting (just like spark shell). Something

RE: Mahout on Spark

2014-03-26 Thread Saikat Kanjilal
+1, in fact I would be very much indebted if someone (namely Dmitry :) ) could do a google hangout focused on spark where folks can ask questions and learn more, to this end I want to bring up something else, it'd be great if mahout itself either through the apache project foundation or through

Re: Mahout on Spark?

2014-02-19 Thread Nick Pentreath
MLlib may be less production tested than Mahout that is true, but I would say Spark is heavily production tested and getting close to a true 1.0 release. Why do you favour Hadoop for "sturdiness"? Spark uses HDFS as an input source (or any Hadoop InputFormat) so benefits from the same fault toleran

Re: Mahout on Spark?

2014-02-19 Thread Suneel Marthi
On Wednesday, February 19, 2014 7:22 PM, Ted Dunning wrote: On Wed, Feb 19, 2014 at 1:55 PM, peng wrote: > But maybe mahout can include contribs that M/R is not fit for, like > downpour SGD or graph-based algorithms? > Yes.  Absolutely. Downpour SGD is #1 on my list of features for 1.

Re: Mahout on Spark?

2014-02-19 Thread Ted Dunning
On Wed, Feb 19, 2014 at 1:55 PM, peng wrote: > But maybe mahout can include contribs that M/R is not fit for, like > downpour SGD or graph-based algorithms? > Yes. Absolutely.

Re: Mahout on Spark?

2014-02-19 Thread peng
I was suggested to switch to MLlib for its performance, but I doubt if that is production ready, even if it is I would still favour hadoop's sturdiness and self-healing. But maybe mahout can include contribs that M/R is not fit for, like downpour SGD or graph-based algorithms? On Wed 19 Feb 20

Re: Mahout on Spark?

2014-02-19 Thread Sean Owen
To set expectations appropriately, I think it's important to point out this is completely infeasible short of a total rewrite, and I can't imagine that will happen. It may not be obvious if you haven't looked at the code how completely dependent on M/R it is. You can swap out M/R and Spark if you

Re: Mahout on Spark?

2014-02-19 Thread Gokhan Capan
I imagine in Mahout offering an option to the users to select from different execution engines (just like we currently do by giving M/R or sequential options), and starting from Spark. I am not sure what changes needed in the codebase, though. Maybe following MLI (or alike) and implementing some mo