On Nov 2, 2011, at 1:01 PM, Jake Mannix wrote:
> On Wed, Nov 2, 2011 at 5:36 AM, Grant Ingersoll wrote:
>
>>
>> Alternatively, the ASF email data is license free. We could take and use
>> a chunk of that. You can pretty much have as much or as little as you
>> want. Since it's broken down b
On Wed, Nov 2, 2011 at 5:36 AM, Grant Ingersoll wrote:
>
> Alternatively, the ASF email data is license free. We could take and use
> a chunk of that. You can pretty much have as much or as little as you
> want. Since it's broken down by project, it has the rough look and feel of
> 20newsgroup
On Wed, Nov 2, 2011 at 9:40 AM, Ted Dunning wrote:
> I have used synthetic data for testing SVD algorithms. It should be
> reasonably easy to generate similar data of known shape for LDA. Not quite
> as easy as generating matrices of known rank with known singular values,
> but not much harder.
I have used synthetic data for testing SVD algorithms. It should be
reasonably easy to generate similar data of known shape for LDA. Not quite
as easy as generating matrices of known rank with known singular values,
but not much harder.
On Wed, Nov 2, 2011 at 2:13 AM, Jake Mannix wrote:
> Any
thing
> like this, and I wanted to get y'all's advice:
>
> Unit tests don't really test for correctness of a complex problem.
>
> Integration tests can, but need to be run either against some known data
> which is small and you have an alternate means of computing
Unit tests don't really test for correctness of a complex problem.
Integration tests can, but need to be run either against some known data
which is small and you have an alternate means of computing the result (or
a reference set of computed results from another program, and the algorithm
is
Drew?
> Add example scripts / integration tests for various algorithms.
> ---
>
> Key: MAHOUT-520
> URL: https://issues.apache.org/jira/browse/MAHOUT-520
> Project: Mahout
>
ttps://hudson.apache.org/hudson/job/Mahout-Quality/578/])
MAHOUT-520: Add example scripts / integration tests for various algorithms.
> Add example scripts / integration tests for various algorithms.
> ---
>
>
all of the examples with a hadoop cluster operational. Posting
the patch for feedback, the goal is to commit in a couple days.
> Add example scripts / integration tests for various algorithms.
> ---
>
> Key:
commit. I hope to have a chance to
close this up by Tuesday evening.
> Add example scripts / integration tests for various algorithms.
> ---
>
> Key: MAHOUT-520
> URL: https://issues.apach
substantial and useful
bit of work and it's been reviewed. Ready to commit then Drew?
> Add example scripts / integration tests for various algorithms.
> ---
>
> Key: MAHOUT-520
>
interactive / non-interactive mode.
For hudson, invoke examples/bin/build-reuters.sh -ni
Users will invoke examples/bin/build-reuters.sh and then they can choose
between kmeans and lda
> Add example scripts / integration tests for various algorit
the
clustering algos so that hudson could verify all of the clustering algos ?
regards
Joe.
> Add example scripts / integration tests for various algorithms.
> ---
>
> Key: MAHOUT-520
> URL
ears that we should take a closer look at the synthetic control
examples to see if we can use the existing arguments to control input/output.
> Add example scripts / integration tests for various algorithms.
> ---
>
>
-ow
Note that, for k-means, the number of clusters (-k) is computed by Canopy using
the supplied -dm, -t1 and -t2 arguments so -k is not an argument.
> Add example scripts / integration tests for various alg
c control examples used hardcoded
directory names. I agree, it makes sense to modify these to use input and
output directories that are parameters. Why don't you should open a new issue
for that and attach a patch when you're ready.
> Add example scripts / integration tests
tion to get the input and
output directories as parameters instead of the current hardcoded directory
names. should I submit a separate patch for this ? what do you suggest ?
regards
Joe.
> Add example scripts / integration tests for various alg
;s to that path instead of blindly cd'ing to examples/bin
> Add example scripts / integration tests for various algorithms.
> ---
>
> Key: MAHOUT-520
> URL: https://issues.apache.o
ork/20news-bydate/bayes-test-input may be pretty safe.
Regardless, I ran the canopy, kmeans and fuzzykmeans algorithms from the script
in interactive mode and they appeared to execute correctly.
> Add example scripts / integration tests for various algorithms.
>
which clustering algo they'd want to use.
5. User chooses a # and the corresponding algo is executed.
I have tested the scenarios failure and success scenarios from my end. If
someone also want to verify, that'll be wonderful.
regards
Joe.
> Add example scripts / integration tests for
Drew,
Thanks for your suggestions.
I have modified the script to enable a non-interactive mode by calling the
script with a parameter "-ni". This will just run the Canopy Clustering. I
am not sure if it should run all the clustering algos. any thots ?
By default the script will be in an interacti
On Sun, Oct 10, 2010 at 11:36 PM, Joe Kumar wrote:
> Drew / all,
>
> I have written a script (80% done) for running the clustering job on
> synthetic control data.
> Should I upload this in MAHOUT-520 or should i open a new jira issue ?
Great! I've revised MAHOUT-520's description to accomotate t
of Mahout from the command-line but also serve as integration tests. We
should add additional scripts that drive the algorithms so new users can
quickly run the examples.
Perhaps these can also be run from hudson as a part of the nightly builds and
can serve as integration tests.
As a start
pt of the commands they're using so that they can
> >> eventually be changed into these sorts of scripts.
> >>
> >> On Fri, Oct 8, 2010 at 3:25 PM, Robin Anil
> wrote:
> >> > +1 for integration script
> >> >
> >> > On Sat, Oct
Sat, Oct 9, 2010 at 12:52 AM, Drew Farris wrote:
>> >
>> >> It sure would be really nice if we had more integration tests /
>> >> example scripts for the various algorithms like build-reuters.sh
>> >> script. These capture problems with the system in t
these sorts of scripts.
>
> On Fri, Oct 8, 2010 at 3:25 PM, Robin Anil wrote:
> > +1 for integration script
> >
> > On Sat, Oct 9, 2010 at 12:52 AM, Drew Farris wrote:
> >
> >> It sure would be really nice if we had more integration tests /
> >> ex
t, Oct 9, 2010 at 12:52 AM, Drew Farris wrote:
>
>> It sure would be really nice if we had more integration tests /
>> example scripts for the various algorithms like build-reuters.sh
>> script. These capture problems with the system in the way real users
>> are likely t
+1 for integration script
On Sat, Oct 9, 2010 at 12:52 AM, Drew Farris wrote:
> It sure would be really nice if we had more integration tests /
> example scripts for the various algorithms like build-reuters.sh
> script. These capture problems with the system in the way real users
>
It sure would be really nice if we had more integration tests /
example scripts for the various algorithms like build-reuters.sh
script. These capture problems with the system in the way real users
are likely to first encounter it, and provide an easy way for new
users to understand the steps of
29 matches
Mail list logo