Re: Goals for Mahout 0.7

2012-02-24 Thread Grant Ingersoll
One of our top goals, in my mind, has to be speeding up our tests! I only wish I knew how given basic attempts at parallelism and Maven have failed miserably. On Feb 14, 2012, at 3:29 PM, Jeff Eastman wrote: > +users@ > > Just to be clear, I'm not advocating replacing the JIRA process with a

Re: Goals for Mahout 0.7

2012-02-23 Thread Ioan Eugen Stan
2012/2/23 Ted Dunning : > Is this a joke? > >    new String[] {"-t", INPUT_TABLE, "-m", MAIL_ACCOUNT_ID} > > seems better than farting around with lists. True, thank you. -- Ioan Eugen Stan http://ieugen.blogspot.com/

Re: Goals for Mahout 0.7

2012-02-23 Thread Ted Dunning
Is this a joke? new String[] {"-t", INPUT_TABLE, "-m", MAIL_ACCOUNT_ID} seems better than farting around with lists. On Thu, Feb 23, 2012 at 2:03 PM, Ioan Eugen Stan wrote: > > String[] args = new String[2]; >> args[0] = "max"; >> args[1] = "7"; >> args[0] = "4"; >> int max = Math.main(arg

Re: Goals for Mahout 0.7

2012-02-23 Thread Ioan Eugen Stan
String[] args = new String[2]; args[0] = "max"; args[1] = "7"; args[0] = "4"; int max = Math.main(args); A more elegant solution is: List argList = new LinkedList(); argList.add("-t"); argList.add(INPUT_TABLE); argList.add("-m"); argList.add(MAIL_ACCOUNT_ID); argList.toArray(new String[ arg

Re: Goals for Mahout 0.7

2012-02-14 Thread Lance Norskog
Yes! Connection R and Mahout within the same JVM is an awesome idea. Approaching Mahout as a non-mathematician user is frustrating because of the difficulty in visualizing and tuning results. I've done some hacky things with KNime and Excel, but the ability to do math-heavy post-processing and vis

Re: Goals for Mahout 0.7

2012-02-14 Thread Dmitriy Lyubimov
I and my company have allocated some time to create some mixed environment of R and other "stuff", and, in particular, Mahout. I am thinking of a "contributed" project with R where R is enabled to do the following roles: #1 Mahout's front end driver mixing Mahout computations and R vector/matrices

Re: Goals for Mahout 0.7

2012-02-14 Thread Jeff Eastman
+users@ Just to be clear, I'm not advocating replacing the JIRA process with a new set of green-field goals. Rather, IMHO, having a small number of overarching goals for a release *could* help us focus our efforts (triage our feature JIRAs) and *might* suggest some missing JIRAs that would gi

Re: Goals for Mahout 0.7

2012-02-14 Thread Sean Owen
When 0.6 was released, there was an all-time record of open JIRAs -- something like 90-100 (I closed maybe 10 quickly.) It's just math: there is a certain level of interest and rate of new requests and issues. There is some level of committer time and energy available to work on them. The former is

Re: Goals for Mahout 0.7

2012-02-14 Thread Jeff Eastman
+1 I think this is an excellent goal. The current code base does not approach its Java APIs in a uniform manner nor are we where we had hoped to be on the CLI API uniformity. There's a lot to do here in both areas. In the Java API area, we do have some notable successes, with the recommender A

Re: Goals for Mahout 0.7

2012-02-13 Thread Ted Dunning
John, This is well said and is a critical need. There are some beginnings to this. The recommender side of the house already works the way you say. The classifier and hashed encoding API's are beginning to work that way. The naive Bayes classifiers pretty much do not and the classifier API's a

Re: Goals for Mahout 0.7

2012-02-13 Thread John Conwell
>From my perspective, I'd really like to see the Mahout API migrate away from a command line centric design it currently utilizes, and migrate more towards an library centric API design. I think this would go a long way in getting Mahout adopted into real life commercial applications. While there

Re: Goals for Mahout 0.7

2012-02-12 Thread Jeff Eastman
We have a couple JIRAs that relate here: We want to factor all the (-cl) classification steps out of all of the driver classes (MAHOUT-930) and into a separate job to remove duplicated code; MAHOUT-931 is to add a pluggable outlier removal capability to this job; and MAHOUT-933 is aimed at fact

Re: Goals for Mahout 0.7

2012-02-12 Thread Jeff Eastman
+ users@ These are great ideas, and are just the kinds of high level conversations I was hoping to engender. From my agile background, I'd hope to define 0.7 by a small number of "epic stories", in a subset of our overall capabilities, which could focus our attention to a set of derivative JI

Re: Goals for Mahout 0.7

2012-02-11 Thread Lance Norskog
For incremental improvements, usability and correctness of algorithms. The "new" Naive Bayes and SGD algorithms both seem to have trouble classifying. Also, interpretation of results. It is hard to summarize the quality of results. I often feel like the math-savvy implementors print a bunch of numb

Re: Goals for Mahout 0.7

2012-02-11 Thread Frank Scholten
I'd like to add solving ClassNotFoundException problems with third party jars in some jobs. I experimented with having seq2sparse uploading a third party jar with analyzer and add it to the DistributedCache. Uploading works but didn't yet get it working inside the Mappers. I have some code lying a

Goals for Mahout 0.7

2012-02-11 Thread Jeff Eastman
Now that 0.6 is in the box, it seems a good time to start thinking about 0.7, from a high level goal perspective at least. Here are a couple that come to mind: * Target code freeze date August 1, 2012 * Get Jenkins working for us again * Complete clustering refactoring and classification con