Re: integration tests

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 1:01 PM, Jake Mannix wrote: > On Wed, Nov 2, 2011 at 5:36 AM, Grant Ingersoll wrote: > >> >> Alternatively, the ASF email data is license free. We could take and use >> a chunk of that. You can pretty much have as much or as little as you >> want. Since it's broken down b

Re: integration tests

2011-11-02 Thread Jake Mannix
On Wed, Nov 2, 2011 at 5:36 AM, Grant Ingersoll wrote: > > Alternatively, the ASF email data is license free. We could take and use > a chunk of that. You can pretty much have as much or as little as you > want. Since it's broken down by project, it has the rough look and feel of > 20newsgroup

Re: integration tests

2011-11-02 Thread Jake Mannix
On Wed, Nov 2, 2011 at 9:40 AM, Ted Dunning wrote: > I have used synthetic data for testing SVD algorithms. It should be > reasonably easy to generate similar data of known shape for LDA. Not quite > as easy as generating matrices of known rank with known singular values, > but not much harder.

Re: integration tests

2011-11-02 Thread Ted Dunning
I have used synthetic data for testing SVD algorithms. It should be reasonably easy to generate similar data of known shape for LDA. Not quite as easy as generating matrices of known rank with known singular values, but not much harder. On Wed, Nov 2, 2011 at 2:13 AM, Jake Mannix wrote: > Any

Re: integration tests

2011-11-02 Thread Grant Ingersoll
thing > like this, and I wanted to get y'all's advice: > > Unit tests don't really test for correctness of a complex problem. > > Integration tests can, but need to be run either against some known data > which is small and you have an alternate means of computing

integration tests

2011-11-02 Thread Jake Mannix
Unit tests don't really test for correctness of a complex problem. Integration tests can, but need to be run either against some known data which is small and you have an alternate means of computing the result (or a reference set of computed results from another program, and the algorithm is

[jira] Updated: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2011-02-03 Thread Sean Owen (JIRA)
Drew? > Add example scripts / integration tests for various algorithms. > --- > > Key: MAHOUT-520 > URL: https://issues.apache.org/jira/browse/MAHOUT-520 > Project: Mahout >

[jira] Commented: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2011-01-22 Thread Hudson (JIRA)
ttps://hudson.apache.org/hudson/job/Mahout-Quality/578/]) MAHOUT-520: Add example scripts / integration tests for various algorithms. > Add example scripts / integration tests for various algorithms. > --- > >

[jira] Updated: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2011-01-18 Thread Drew Farris (JIRA)
all of the examples with a hadoop cluster operational. Posting the patch for feedback, the goal is to commit in a couple days. > Add example scripts / integration tests for various algorithms. > --- > > Key:

[jira] Commented: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2011-01-16 Thread Drew Farris (JIRA)
commit. I hope to have a chance to close this up by Tuesday evening. > Add example scripts / integration tests for various algorithms. > --- > > Key: MAHOUT-520 > URL: https://issues.apach

[jira] Updated: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2011-01-15 Thread Sean Owen (JIRA)
substantial and useful bit of work and it's been reviewed. Ready to commit then Drew? > Add example scripts / integration tests for various algorithms. > --- > > Key: MAHOUT-520 >

[jira] Updated: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2010-10-14 Thread Joe Prasanna Kumar (JIRA)
interactive / non-interactive mode. For hudson, invoke examples/bin/build-reuters.sh -ni Users will invoke examples/bin/build-reuters.sh and then they can choose between kmeans and lda > Add example scripts / integration tests for various algorit

[jira] Commented: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2010-10-13 Thread Joe Prasanna Kumar (JIRA)
the clustering algos so that hudson could verify all of the clustering algos ? regards Joe. > Add example scripts / integration tests for various algorithms. > --- > > Key: MAHOUT-520 > URL

[jira] Commented: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2010-10-13 Thread Drew Farris (JIRA)
ears that we should take a closer look at the synthetic control examples to see if we can use the existing arguments to control input/output. > Add example scripts / integration tests for various algorithms. > --- > >

[jira] Commented: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2010-10-13 Thread Jeff Eastman (JIRA)
-ow Note that, for k-means, the number of clusters (-k) is computed by Canopy using the supplied -dm, -t1 and -t2 arguments so -k is not an argument. > Add example scripts / integration tests for various alg

[jira] Commented: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2010-10-13 Thread Drew Farris (JIRA)
c control examples used hardcoded directory names. I agree, it makes sense to modify these to use input and output directories that are parameters. Why don't you should open a new issue for that and attach a patch when you're ready. > Add example scripts / integration tests

[jira] Commented: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2010-10-12 Thread Joe Prasanna Kumar (JIRA)
tion to get the input and output directories as parameters instead of the current hardcoded directory names. should I submit a separate patch for this ? what do you suggest ? regards Joe. > Add example scripts / integration tests for various alg

[jira] Updated: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2010-10-12 Thread Drew Farris (JIRA)
;s to that path instead of blindly cd'ing to examples/bin > Add example scripts / integration tests for various algorithms. > --- > > Key: MAHOUT-520 > URL: https://issues.apache.o

[jira] Commented: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2010-10-12 Thread Drew Farris (JIRA)
ork/20news-bydate/bayes-test-input may be pretty safe. Regardless, I ran the canopy, kmeans and fuzzykmeans algorithms from the script in interactive mode and they appeared to execute correctly. > Add example scripts / integration tests for various algorithms. >

[jira] Updated: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2010-10-11 Thread Joe Prasanna Kumar (JIRA)
which clustering algo they'd want to use. 5. User chooses a # and the corresponding algo is executed. I have tested the scenarios failure and success scenarios from my end. If someone also want to verify, that'll be wonderful. regards Joe. > Add example scripts / integration tests for

Re: integration tests / example scripts?

2010-10-11 Thread Joe Kumar
Drew, Thanks for your suggestions. I have modified the script to enable a non-interactive mode by calling the script with a parameter "-ni". This will just run the Canopy Clustering. I am not sure if it should run all the clustering algos. any thots ? By default the script will be in an interacti

Re: integration tests / example scripts?

2010-10-11 Thread Drew Farris
On Sun, Oct 10, 2010 at 11:36 PM, Joe Kumar wrote: > Drew / all, > > I have written a script (80% done) for running the clustering job on > synthetic control data. > Should I upload this in MAHOUT-520 or should i open a new jira issue ? Great! I've revised MAHOUT-520's description to accomotate t

[jira] Updated: (MAHOUT-520) Add example scripts / integration tests for various algorithms.

2010-10-11 Thread Drew Farris (JIRA)
of Mahout from the command-line but also serve as integration tests. We should add additional scripts that drive the algorithms so new users can quickly run the examples. Perhaps these can also be run from hudson as a part of the nightly builds and can serve as integration tests. As a start

Re: integration tests / example scripts?

2010-10-10 Thread Joe Kumar
pt of the commands they're using so that they can > >> eventually be changed into these sorts of scripts. > >> > >> On Fri, Oct 8, 2010 at 3:25 PM, Robin Anil > wrote: > >> > +1 for integration script > >> > > >> > On Sat, Oct

Re: integration tests / example scripts?

2010-10-09 Thread Gangadhar Nittala
Sat, Oct 9, 2010 at 12:52 AM, Drew Farris wrote: >> > >> >> It sure would be really nice if we had more integration tests / >> >> example scripts for the various algorithms like build-reuters.sh >> >> script. These capture problems with the system in t

Re: integration tests / example scripts?

2010-10-08 Thread Ted Dunning
these sorts of scripts. > > On Fri, Oct 8, 2010 at 3:25 PM, Robin Anil wrote: > > +1 for integration script > > > > On Sat, Oct 9, 2010 at 12:52 AM, Drew Farris wrote: > > > >> It sure would be really nice if we had more integration tests / > >> ex

Re: integration tests / example scripts?

2010-10-08 Thread Drew Farris
t, Oct 9, 2010 at 12:52 AM, Drew Farris wrote: > >> It sure would be really nice if we had more integration tests / >> example scripts for the various algorithms like build-reuters.sh >> script. These capture problems with the system in the way real users >> are likely t

Re: integration tests / example scripts?

2010-10-08 Thread Robin Anil
+1 for integration script On Sat, Oct 9, 2010 at 12:52 AM, Drew Farris wrote: > It sure would be really nice if we had more integration tests / > example scripts for the various algorithms like build-reuters.sh > script. These capture problems with the system in the way real users >

integration tests / example scripts?

2010-10-08 Thread Drew Farris
It sure would be really nice if we had more integration tests / example scripts for the various algorithms like build-reuters.sh script. These capture problems with the system in the way real users are likely to first encounter it, and provide an easy way for new users to understand the steps of