What's the plan for Mahout?

2009-09-04 Thread Sean Owen
Guys, quick and broad question -- what's the roadmap for Mahout look like? Even just for the next two releases? Now, much of the project is mostly a space for tinkering, tossing around bits of code for now, and that's OK for 0.1 or 0.2. I just wonder what the path to a proper finished product is l

Re: What's the plan for Mahout?

2009-09-04 Thread Ted Dunning
These are good questions to ask. I don't know that we are ready to answer them, but I do think that we have pieces of the answers. So far, there are three or four general themes that seem to be of real interest/value a) taste/collaborative filtering/cooccurrence analysis b) facilitation of conv

Re: What's the plan for Mahout?

2009-09-04 Thread Sean Owen
I agree with your high-level breakdown, good. I wouldn't say I'm in a hurry yet, just wishing to ask the questions and start to form a plan. I don't mind if it's just coming into something polished in a year -- would be worried if a year passes and it's not clear where it's going. I appreciate thi

Re: What's the plan for Mahout?

2009-09-04 Thread Grant Ingersoll
First off, thanks for bringing this up! On Sep 4, 2009, at 9:13 AM, Sean Owen wrote: Guys, quick and broad question -- what's the roadmap for Mahout look like? Even just for the next two releases? I asked a little while back about this. I think we can put out 0.2 out after Robin and Denech

Re: What's the plan for Mahout?

2009-09-04 Thread Ted Dunning
I think that Mahout is more along the lines of the hadoop gestation. There was initially one need for that (Nutch simplificaition) and it has grown into quite something. It is also only new verging on beginning to think about what 1.0 should be. That is fine by me. On Fri, Sep 4, 2009 at 2:03 P

Re: What's the plan for Mahout?

2009-09-04 Thread Grant Ingersoll
On Sep 4, 2009, at 1:07 PM, Ted Dunning wrote: These are good questions to ask. I don't know that we are ready to answer them, but I do think that we have pieces of the answers. So far, there are three or four general themes that seem to be of real interest/value a) taste/collaborative fil

Re: What's the plan for Mahout?

2009-09-04 Thread Bertie Shen
Hi I just subscribed this maillist and plan to use mahout collaborative filtering part. I feel that mahout may be better focused on a few algorithms first and do it very well in a scalable way. Simple algorithms such as naive bayes and {item|user}-based collaborative filtering may be the initial

Re: What's the plan for Mahout?

2009-09-05 Thread Sean Owen
To kind of wrap this up for now -- I hear some consensus that Mahout is about distributed, Hadoop-based solutions for developers. So let's make sure we present a clean, coherent API to developers wanting to run the project's Hadoop jobs. I think we're a little bit stuck now as Hadoop 0.20.0 is a

Re: What's the plan for Mahout?

2009-09-05 Thread Grant Ingersoll
On Sep 5, 2009, at 9:41 AM, Sean Owen wrote: To kind of wrap this up for now -- I hear some consensus that Mahout is about distributed, Hadoop-based solutions for developers. So let's make sure we present a clean, coherent API to developers wanting to run the project's Hadoop jobs. I don't t

Re: What's the plan for Mahout?

2009-09-05 Thread Ted Dunning
I would say that Mahout is about scalable machine learning solutions. Those may use Hadoop. Or not. The emphasis is on scaling. On Sat, Sep 5, 2009 at 6:41 AM, Sean Owen wrote: > I hear some consensus that Mahout is about distributed, Hadoop-based > solutions for developers. So let's make sur

Re: What's the plan for Mahout?

2009-09-05 Thread Tanton Gibbs
+1 The scalable part is extremely important. Perhaps we could add robust as well, to ensure that the project does not become academic in nature. On Sat, Sep 5, 2009 at 1:14 PM, Ted Dunning wrote: > I would say that Mahout is about scalable machine learning solutions.  Those > may use Hadoop.  Or

Re: What's the plan for Mahout?

2009-09-06 Thread Isabel Drost
On Saturday 05 September 2009 17:30:14 Grant Ingersoll wrote: > we are a machine learning project with a commercial > friendly license and a solid community aiming to build fast, production > ready libraries. +1 I think that summarizes pretty well what I see in Mahout as well. > Java, Hadoop an

Re: What's the plan for Mahout?

2009-09-06 Thread Sean Owen
Practically speaking, to guide short-term goals, we do need to start with a narrower, coherent remit and expand later. Starting as a Java-based, Hadoop-based library for developers, focusing on collaborative filtering, clustering, categorization, and a few other things sounds just right. It would

Re: What's the plan for Mahout?

2009-09-06 Thread Ted Dunning
I see this as a critical issue. On Sun, Sep 6, 2009 at 8:31 AM, Isabel Drost wrote: > > > but those systems always involve quite a bit of engineering to connect > the > > data fire-hoses into the right spigots. > > I wonder whether there is any way we can make that easier for users? We > certain

Re: What's the plan for Mahout?

2009-09-07 Thread Sean Owen
I don't know of any other viable alternative at the moment, and I think any alternative would be sufficiently different that it would be hard to meaningfully abstract it away without inventing our own little mapreduce layer. It still doesn't save anyone from thinking about the details of configurin

Re: What's the plan for Mahout?

2009-09-07 Thread Lukáš Vlček
Hi, just a note: Wouldn't it be better to talk about MapReduce as opposed to Hadoop? This means that for each algorithm implemented in Mahout it should be clearly stated wheter it is MapReduce based implementation or not (or using other ways to make it scalable). I can imagine it could be useful to

Re: What's the plan for Mahout?

2009-09-07 Thread Lukáš Vlček
May be there is no direct equivalent but there are many ways how one can build MapReduce architecture into existing system without Hadoop. And there is something all these systems have in common at high level. I can see many existing systems are adding MapReduce paradigm into their stack (e.g.: Ast

Re: What's the plan for Mahout?

2009-09-07 Thread Lukáš Vlček
> > > (In comparison, take a look at something as simple as logging. Through > people inventing abstractions, and abstractions on abstractions, it's > actually turned into something difficult to manage. Using SL4FJ, > putting in the right bindings .jar so it routes through Log4J -- and > don't forg

Re: What's the plan for Mahout?

2009-09-07 Thread Sean Owen
I am sure the project needs to refactor and unify the Hadoop-related code. There's a lot of copy and paste at this stage. That would go some way towards abstracting away Hadoop -- would tend to centralize the dependency. I think there's a lot more to it -- abstracting away contacting a cluster? ru

Re: What's the plan for Mahout?

2009-09-07 Thread Grant Ingersoll
On Sep 7, 2009, at 4:49 AM, Sean Owen wrote: I am sure the project needs to refactor and unify the Hadoop-related code. There's a lot of copy and paste at this stage. That would go some way towards abstracting away Hadoop -- would tend to centralize the dependency. I think there's a lot more t

Re: What's the plan for Mahout?

2009-09-07 Thread Sean Owen
Well that is true. I think I'm stating the obvious when I say there is also a danger in too little direction and focus. I imagine we will find a happy medium somewhere in between. On Mon, Sep 7, 2009 at 1:56 PM, Grant Ingersoll wrote: > > The hard thing about all of this is, in open source, you ne

Re: What's the plan for Mahout?

2009-09-07 Thread Ted Dunning
I would say that abstracting away from hadoop is a huge issue that we definitely don't need to worry about right now. Even the hadoop guys haven't figured out the right interface yet! On Mon, Sep 7, 2009 at 12:32 AM, Lukáš Vlček wrote: > just a note: Wouldn't it be better to talk about MapReduc

Re: What's the plan for Mahout?

2009-09-07 Thread Isabel Drost
On Monday 07 September 2009 18:41:10 Ted Dunning wrote: > I would say that abstracting away from hadoop is a huge issue that we > definitely don't need to worry about right now. +1 Isabel -- |\ _,,,---,,_ Web: /,`.-'`'-. ;-;;,_ |,4- ) )-,_