[jira] [Updated] (MAHOUT-867) Add ClusterEvaluator capabilities to ClusterDumper

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-867: --- Attachment: MAHOUT-867.patch Adds --evaluate option to ClusterDumper, which then uses the

[jira] [Assigned] (MAHOUT-867) Add ClusterEvaluator capabilities to ClusterDumper

2011-11-02 Thread Grant Ingersoll (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-867: -- Assignee: Grant Ingersoll > Add ClusterEvaluator capabilities to ClusterDum

[jira] [Created] (MAHOUT-867) Add ClusterEvaluator capabilities to ClusterDumper

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
Add ClusterEvaluator capabilities to ClusterDumper -- Key: MAHOUT-867 URL: https://issues.apache.org/jira/browse/MAHOUT-867 Project: Mahout Issue Type: Improvement Reporter: Grant

Re: Dirchlet

2011-11-02 Thread Grant Ingersoll
> >>> Line 58: context.write(new Text(Integer.toString(i)), new >>> VectorWritable(new DenseVector(0))); >>> >>> See >>> http://www.cloudera.com/blog/2009/12/7-tips-for-improving-mapreduce-performance/ >>> >>> Frank >>> >

[jira] [Assigned] (MAHOUT-866) Move Precondition checks out of Mahalanobis.distance method and into configuration/setup

2011-11-02 Thread Grant Ingersoll (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-866: -- Assignee: Grant Ingersoll > Move Precondition checks out of Mahalanobis.dista

[jira] [Resolved] (MAHOUT-866) Move Precondition checks out of Mahalanobis.distance method and into configuration/setup

2011-11-02 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-866. Resolution: Fixed Fix Version/s: 0.6 > Move Precondition checks out

[jira] [Created] (MAHOUT-866) Move Precondition checks out of Mahalanobis.distance method and into configuration/setup

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor The Mahalanobis distance currently checks certain preconditions on member variables for every call to distance(). These should be done as part of setup, not as part of the distance

Re: Dirchlet

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 5:29 PM, Jeff Eastman wrote: > I think the scalability problems you are seeing are a consequence of using > the default GaussianCluster models. These models perform especially poorly > for large text clustering problems such as email. The pdf() calculation over > wide topic

Re: Dirchlet

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 6:05 PM, Ted Dunning wrote: > I have done some testing and have been unable to demonstrate a big > difference in allocating versus re-using. Re-using is, however, *really* > error prone. > I've been bitten by that one at least once. It's a pain to debug.

Use of Preconditions in Mahalanobis and other places

2011-11-02 Thread Grant Ingersoll
I was looking at some of the DistanceMeasure stuff (Mahalanobis at the moment) and it strikes me as a bit odd that our core distance() functions would use Preconditions to check things that are part of construction/configuration of the instance. A call to distance() is likely executed a lot of

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142635#comment-13142635 ] Grant Ingersoll commented on MAHOUT-524: bq. I applied your patch but I'

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142633#comment-13142633 ] Grant Ingersoll commented on MAHOUT-524: bq. I applied your patch but I'

[jira] [Updated] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-524: --- Attachment: screenshot-1.jpg Of course, the results don't really speak well of SKM, b

[jira] [Updated] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-524: --- Attachment: MAHOUT-524.patch This gets past the Lanczos issue by checking the size. __ I

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142588#comment-13142588 ] Grant Ingersoll commented on MAHOUT-524: Seems the numDims == 1100 ther

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142583#comment-13142583 ] Grant Ingersoll commented on MAHOUT-524: in this particular case, the state h

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142586#comment-13142586 ] Grant Ingersoll commented on MAHOUT-524: I guess the 1100 comes from how we

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142557#comment-13142557 ] Grant Ingersoll commented on MAHOUT-524: The NPE is from one of the rowJ va

[jira] [Updated] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-524: --- Attachment: MAHOUT-524.patch patch so far, never mind the DisplayMinHash stuff, as I forgot

Dirchlet

2011-11-02 Thread Grant Ingersoll
Tim Potter and I have tried running Dirchlet in the past on the ASF email set on EC2 and it didn't seem to scale all that well, so I was wondering if people had ideas on improving it's speed. One question I had is whether we could inject a Combiner into the process? Ted also mentioned that the

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142530#comment-13142530 ] Grant Ingersoll commented on MAHOUT-524: Making this change does indeed ge

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142510#comment-13142510 ] Grant Ingersoll commented on MAHOUT-524: REalizing now that Jeff already

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142477#comment-13142477 ] Grant Ingersoll commented on MAHOUT-524: Tracing into the Hadoop code, this &

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142461#comment-13142461 ] Grant Ingersoll commented on MAHOUT-524: bq. Is there any way we could simp

[jira] [Updated] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-862: --- Fix Version/s: 0.6 > MurmurHash 3.0 > -- > > Key

[jira] [Created] (MAHOUT-865) Refactor Sequential Clustering algorithms

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
Refactor Sequential Clustering algorithms - Key: MAHOUT-865 URL: https://issues.apache.org/jira/browse/MAHOUT-865 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll

[jira] [Updated] (MAHOUT-863) Add DisplayMinhash clustering example

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-863: --- Attachment: MAHOUT-863.patch Here's a start. It doesn't display the items y

Re: integration tests

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 1:01 PM, Jake Mannix wrote: > On Wed, Nov 2, 2011 at 5:36 AM, Grant Ingersoll wrote: > >> >> Alternatively, the ASF email data is license free. We could take and use >> a chunk of that. You can pretty much have as much or as little as you >>

[jira] [Commented] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142324#comment-13142324 ] Grant Ingersoll commented on MAHOUT-862: I committed the

Re: Goodbye graph algorithms

2011-11-02 Thread Grant Ingersoll
WFM - works for me. On Nov 2, 2011, at 11:30 AM, Sebastian Schelter wrote: > On 02.11.2011 16:04, Jake Mannix wrote: >> On Wed, Nov 2, 2011 at 6:38 AM, Grant Ingersoll wrote: >> >>> Perhaps it would make sense to move them to a branch? I know we never >>> re

Re: Canopy and other clustering approaches

2011-11-02 Thread Grant Ingersoll
is needed. > > Thanks and Regards, > Paritosh > > On 02-11-2011 09:01, Grant Ingersoll wrote: >> In reviewing clustering for upcoming training, I'm wondering about something >> w/ Canopy clustering that we claim, but wanted to check here first. In the >>

[jira] [Resolved] (MAHOUT-864) DisplayCanopy doesn't show any clusters

2011-11-02 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-864. Resolution: Fixed Fix Version/s: 0.6 Assignee: Grant Ingersoll

[jira] [Commented] (MAHOUT-864) DisplayCanopy doesn't show any clusters

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142212#comment-13142212 ] Grant Ingersoll commented on MAHOUT-864: Appears to be due to the fact

[jira] [Created] (MAHOUT-864) DisplayCanopy doesn't show any clusters

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
DisplayCanopy doesn't show any clusters --- Key: MAHOUT-864 URL: https://issues.apache.org/jira/browse/MAHOUT-864 Project: Mahout Issue Type: Bug Reporter: Grant Ingersoll Pri

[jira] [Updated] (MAHOUT-864) DisplayCanopy doesn't show any clusters

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-864: --- Component/s: Examples Clustering > DisplayCanopy doesn't

[jira] [Created] (MAHOUT-863) Add DisplayMinhash clustering example

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
Add DisplayMinhash clustering example - Key: MAHOUT-863 URL: https://issues.apache.org/jira/browse/MAHOUT-863 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll

[jira] [Commented] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142200#comment-13142200 ] Grant Ingersoll commented on MAHOUT-862: Committed revision 1196616. I'

[jira] [Commented] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142199#comment-13142199 ] Grant Ingersoll commented on MAHOUT-862: I accidentally committed this

[jira] [Updated] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-862: --- Attachment: MAHOUT-862.patch Here's a patch that adds MurmurHash3. Tests pass, but I&

[jira] [Resolved] (MAHOUT-859) Move Decision Forests to classifier package

2011-11-02 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-859. Resolution: Fixed Committed revision 1196578. > Move Decision Forests

Re: Goodbye graph algorithms

2011-11-02 Thread Grant Ingersoll
>> theory. >> >> Finally the spectral clustering piece of Mahout also takes graph input >> (affinities) and there are decades of research papers that account for >> this in terms of eigenvectors/values of laplacian representations of >> the graph af

[jira] [Created] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
MurmurHash 3.0 -- Key: MAHOUT-862 URL: https://issues.apache.org/jira/browse/MAHOUT-862 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority

Re: integration tests

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 5:13 AM, Jake Mannix wrote: > So in the process of getting the LDA improvements I've got brewing over on > GitHub, and I'm doing my good due diligence and making more unit tests and > so forth, and I'm trying to figure out the best way to unit test something > like this, and I

Re: Reviewboard?

2011-11-02 Thread Grant Ingersoll
etc. Plus, in-line comment tracking! > > -jake -- Grant Ingersoll http://www.lucidimagination.com

[jira] [Commented] (MAHOUT-854) Add MinHash to build-reuters.sh example

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141900#comment-13141900 ] Grant Ingersoll commented on MAHOUT-854: I've committed this, but will

Re: [jira] [Created] (MAHOUT-860) Create minimalist maven module for *Writable classes for export

2011-11-01 Thread Grant Ingersoll
On Nov 2, 2011, at 12:00 AM, Jake Mannix wrote: > Anyone with mad maven skills know how to churn that out in a short > evenings-worth of work? :) Is there such a thing? :-) As an alternative, we could simply generate a Jar that contains just the necessary files and no re-org is necessary.

[jira] [Commented] (MAHOUT-854) Add MinHash to build-reuters.sh example

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141894#comment-13141894 ] Grant Ingersoll commented on MAHOUT-854: bq. 1. Is it just me or when I

[jira] [Commented] (MAHOUT-344) Minhash based clustering

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141892#comment-13141892 ] Grant Ingersoll commented on MAHOUT-344: Ankur, any luck on documenting

[jira] [Assigned] (MAHOUT-854) Add MinHash to build-reuters.sh example

2011-11-01 Thread Grant Ingersoll (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-854: -- Assignee: Grant Ingersoll > Add MinHash to build-reuters.sh exam

Canopy and other clustering approaches

2011-11-01 Thread Grant Ingersoll
In reviewing clustering for upcoming training, I'm wondering about something w/ Canopy clustering that we claim, but wanted to check here first. In the lectures, etc. I've seen on it, the idea is to run Canopy first and then some other more expensive algorithm, such as k-means, etc. with the id

[jira] [Created] (MAHOUT-859) Move Decision Forests to classifier package

2011-11-01 Thread Grant Ingersoll (Created) (JIRA)
Move Decision Forests to classifier package --- Key: MAHOUT-859 URL: https://issues.apache.org/jira/browse/MAHOUT-859 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll

Random Forests

2011-11-01 Thread Grant Ingersoll
Anyone object to me moving the Decision/Random Forest stuff into the classifiers package? Seems like that is where it rightfully belongs. -Grant

[jira] [Updated] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-857: --- Attachment: MAHOUT-857-ll.patch Add support for log-likelihood capture in the results

Re: Train/TestNewsGroups with SGD

2011-11-01 Thread Grant Ingersoll
On Nov 1, 2011, at 2:45 PM, Sean Owen wrote: > RandomUtils.setTestSeed() (or something like that) makes all the RNGs > deterministic -- well if they are using RandomUtils. I see it in use in at least one place. > > On Tue, Nov 1, 2011 at 6:20 PM, Grant Ingersoll wrote: >&

[jira] [Commented] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141464#comment-13141464 ] Grant Ingersoll commented on MAHOUT-857: I committed the last patch, plus

[jira] [Updated] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-857: --- Attachment: MAHOUT-857.patch Much better looking patch. Cleaned up the code, dropped the

Train/TestNewsGroups with SGD

2011-11-01 Thread Grant Ingersoll
I'm working on https://issues.apache.org/jira/browse/MAHOUT-857. Each time I run it, I get different answers for SGD for the confusion matrix, which is presumably due to the randomness built in. However, is there a way to set the seed so one can reproduce results for actually testing the cod

[jira] [Commented] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141371#comment-13141371 ] Grant Ingersoll commented on MAHOUT-857: Working through some more of this,

[jira] [Commented] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141353#comment-13141353 ] Grant Ingersoll commented on MAHOUT-857: Here's the new confusion matri

[jira] [Updated] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-857: --- Attachment: MAHOUT-857.patch Looks like it was an off by one error due to the use of

[jira] [Commented] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141321#comment-13141321 ] Grant Ingersoll commented on MAHOUT-857: Here's the conf. matrix I&

[jira] [Updated] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-857: --- Attachment: MAHOUT-857.patch Here's a patch. It isn't correct yet for running th

Re: Towards 1.0 - Defining backwards compatibility guarantees

2011-11-01 Thread Grant Ingersoll
On Nov 1, 2011, at 12:15 PM, Ted Dunning wrote: > I think the trend is away from an explicit version in serialized data and > toward systems like protobufs or avro which allow much more flexibility. +1 > > Sent from my iPhone > > On Nov 1, 2011, at 5:09, Grant Ingersoll

[jira] [Created] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Created) (JIRA)
Reporter: Grant Ingersoll We have build-20news-bayes.sh that runs our NB stuff on 20 news groups. We also have an SGD example that works on 20 news groups, but no script to run it. I'm going to rename build-20news-bayes.sh to classify-20news.sh and incorporate the two. --

[jira] [Resolved] (MAHOUT-856) build-20news-bayes.sh doesn't work when downloading content

2011-11-01 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-856. Resolution: Fixed Fix Version/s: 0.6 > build-20news-bayes.sh doesn't w

[jira] [Created] (MAHOUT-856) build-20news-bayes.sh doesn't work when downloading content

2011-11-01 Thread Grant Ingersoll (Created) (JIRA)
: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor The build-20news-bayes.sh script doesn't work when downloading the content for the first time. The issue is that it changes the directory to the temp directory and then later tries to do "cd ../

Re: Towards 1.0 - Defining backwards compatibility guarantees

2011-11-01 Thread Grant Ingersoll
On Nov 1, 2011, at 8:09 AM, Grant Ingersoll wrote: > FWIW, in Lucene, we do the following: > > 1. All minor versions within a major release can read prior versions index > within the same major release. That is, 3.4 can read a 3.3 index. However, > 3.3 cannot read a 3.4 inde

[jira] [Commented] (MAHOUT-155) ARFF VectorIterable

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141099#comment-13141099 ] Grant Ingersoll commented on MAHOUT-155: Hey Joe, Since these are categor

Re: Towards 1.0 - Defining backwards compatibility guarantees

2011-11-01 Thread Grant Ingersoll
t; Isabel > > > * though not the only one - others include but are not limited to the time > frame > for which we offer support for any given release. Grant Ingersoll http://www.lucidimagination.com

Re: Patch : Formatting

2011-10-31 Thread Grant Ingersoll
verybody else to use. > https://issues.apache.org/jira/browse/MAHOUT > > > Further here is a patch checklist: > https://cwiki.apache.org/confluence/display/MAHOUT/Patch+Check+List > >> >> Thanks and Regards, >> Paritosh > > /Manuel Grant Ingersoll http://www.lucidimagination.com

[jira] [Resolved] (MAHOUT-855) LuceneTextValueEncoder doesn't properly set internal buffers, causing BufferUnderflowException

2011-10-31 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-855. Resolution: Fixed Committed revision 1195549. > LuceneTextValueEnco

[jira] [Updated] (MAHOUT-855) LuceneTextValueEncoder doesn't properly set internal buffers, causing BufferUnderflowException

2011-10-31 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-855: --- Attachment: MAHOUT-855.patch Here's a fix, going to commit sh

[jira] [Commented] (MAHOUT-855) LuceneTextValueEncoder doesn't properly set internal buffers, causing BufferUnderflowException

2011-10-31 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140234#comment-13140234 ] Grant Ingersoll commented on MAHOUT-855: At least two issues here: 1.

[jira] [Created] (MAHOUT-855) LuceneTextValueEncoder doesn't properly set internal buffers, causing BufferUnderflowException

2011-10-31 Thread Grant Ingersoll (Created) (JIRA)
T-855 Project: Mahout Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 0.6 The LuceneTextValueEncoder throws an BufferUnderflowException when used. See the code below. The problem ap

[jira] [Commented] (MAHOUT-627) Baum-Welch Algorithm on Map-Reduce for Parallel Hidden Markov Model Training.

2011-10-31 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13140149#comment-13140149 ] Grant Ingersoll commented on MAHOUT-627: I'm going to look to commit

Incorporating JIRA Suggestions, was Re: Demoralized over JIRA state

2011-10-31 Thread Grant Ingersoll
sible, even > arrange for the front-page view of the project to highlight the open > defects and open issues chosen for the upcoming release rather that > the total open JIRAs. > > As for the practical issues, I've already elaborated them in the > discussion of how to have maturing patches be in source control > instead of (or in addition to), so I won't repeat (much). Grant Ingersoll http://www.lucidimagination.com

Re: Improving Our JIRA State

2011-10-26 Thread Grant Ingersoll
ehorn things into an existing driver. > > Dan -------- Grant Ingersoll http://www.lucidimagination.com

[jira] [Resolved] (MAHOUT-852) Upgrade Lucene dependency to 3.4

2011-10-26 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-852. Resolution: Fixed > Upgrade Lucene dependency to

[jira] [Created] (MAHOUT-852) Upgrade Lucene dependency to 3.4

2011-10-26 Thread Grant Ingersoll (Created) (JIRA)
Upgrade Lucene dependency to 3.4 Key: MAHOUT-852 URL: https://issues.apache.org/jira/browse/MAHOUT-852 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll Assignee

Re: Recruiting new contributors and committers [was: Demoralized over JIRA state]

2011-10-26 Thread Grant Ingersoll
. >> >> On Oct 24, 2011, at 6:39 PM, Isabel Drost wrote: >> >>> On 24.10.2011 Grant Ingersoll wrote: >>>> Docs is one obvious one. Also, just keep supplying patches. >>> >>> Speaking of supplying patches: Doing so seems non-trivial for quite s

Re: Overtraining effects in NB

2011-10-25 Thread Grant Ingersoll
Robin, Any luck with this? On Oct 11, 2011, at 7:22 AM, Robin Anil wrote: > I am guessing this is on the new naivebayes package. I would like to check > the data and compare against the old implementation if its a bug. > > On Tue, Oct 11, 2011 at 4:18 PM, Grant Ingersoll wrote:

Re: autoexported sites 'to be phased out by Nov 2011'

2011-10-25 Thread Grant Ingersoll
On Oct 25, 2011, at 11:04 AM, Isabel Drost wrote: > On 25.10.2011 Dan Brickley wrote: >> These make clear the urgency; the auto exporter is unmaintained, and breaks >> with Confluence updates. > > Ok - so the bottom line is: Auto export will go away. Confluence will remain. > As > linking to d

Re: Demoralized over JIRA state

2011-10-25 Thread Grant Ingersoll
et for everything > else. That's pretty good. I won't molest it; I might suggest we push > some things there. > > > Obviously the more important thing is to action some of the important > changes *that really should happen in a next release*, 0.6. Then file > some JIR

[jira] [Created] (MAHOUT-851) Add SGD to build-asf-email.sh example

2011-10-25 Thread Grant Ingersoll (Created) (JIRA)
Add SGD to build-asf-email.sh example - Key: MAHOUT-851 URL: https://issues.apache.org/jira/browse/MAHOUT-851 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll

Re: Demoralized over JIRA state

2011-10-24 Thread Grant Ingersoll
tuff to be peer reviewed. Why not have a 'backlog' target and > let it live there? > > On Mon, Oct 24, 2011 at 9:59 AM, Jake Mannix wrote: >> On Mon, Oct 24, 2011 at 5:25 AM, Grant Ingersoll wrote: >>> >>> > - Anything that isn't fixed by Dec

Re: Demoralized over JIRA state

2011-10-24 Thread Grant Ingersoll
On Oct 23, 2011, at 6:29 AM, Dan Brickley wrote: > [snip] > > Interesting discussion, and maybe a good time for those of us making > use of all this code to remember to say 'thanks'. So, er yeah, thanks. > > One thing I would like to bring up, as you talk this stuff through, is > that there are

Re: Demoralized over JIRA state

2011-10-24 Thread Grant Ingersoll
#x27;ve >> been wondering whether to make a patch. >> * https://issues.apache.org/jira/browse/MAHOUT-804 "Each page in >> Mahout's Confluence Wiki has 2 URLs, with differing page styles and >> search behaviours" ...is me talking to myself. Hard to know how to >> help here. >> >> So I don't want to de-rail discussion into the detail of these >> specific JIRAs; but rather to take them as example of what it's like >> to be a Mahout user and run into some issue, reported via mail or >> JIRA. The way things are framed now sort of sets things up for us to >> report a problem and then just wait for "you guys" to do the hard work >> of fixing it. Maybe there are some tricks for widening the workforce >> without creating a huge coordination and management burden for the >> core committers? >> >> cheers, >> >> Dan >> -- Grant Ingersoll http://www.lucidimagination.com

Re: Demoralized over JIRA state

2011-10-23 Thread Grant Ingersoll
The only issue I am really concerned about w provenance is pull requests from non ASF people that are brought in. Sometimes hard to track On Oct 23, 2011, at 7:56 AM, Benson Margulies wrote: > I just want to focus on the provenance question, but, really, you can > ignore me. I'm not trying to w

Re: Demoralized over JIRA state

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 7:34 PM, Benson Margulies wrote: > When the board looks at the health of a community, one of the > questions it asks (or so I am told) is, 'Is the community responsive > to requests for assistance?' I think we are, but of course we could be better. > > Now, the board's bar

Re: Demoralized over JIRA state

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 6:41 PM, Sean Owen wrote: > Thanks! good thread. > > On Sat, Oct 22, 2011 at 3:30 PM, Grant Ingersoll wrote: >> 1. We aim for releases every 6 months or so >> 2. We make a best guess up front about what bug fixes will be in that >> release,

Re: Demoralized over JIRA state

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 2:19 PM, Sean Owen wrote: > Bringing this to dev@, mid-thread, per Grant's suggestion. There was a > brief and fruitful thread on private@ to discuss project governance, > but the topic has shifted such that it's useful to just talk on dev@. > > If I may paraphrase: I express

[jira] [Reopened] (MAHOUT-698) Hook up Automated Patch Checking for Mahout

2011-10-15 Thread Grant Ingersoll (Reopened) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reopened MAHOUT-698: I'd say we leave this one open. When done right, it can help people get feedback right

Re: [REPORT] Apache Mahout

2011-10-14 Thread Grant Ingersoll
On Oct 14, 2011, at 1:38 PM, Ted Dunning wrote: > > > Which others are there? Maybe we should mention them all in this report. 2 is a "number of bundles" to me :-)

Re: [REPORT] Apache Mahout

2011-10-14 Thread Grant Ingersoll
+1 to that, or, alternatively, we should simply say Mahout is in a number of bundles at this point and we believe all players are properly following ASF branding guidelines. We will continue to monitor. On Oct 14, 2011, at 12:51 PM, Ted Dunning wrote: > It might, for equity, be reasonable to m

[jira] [Commented] (MAHOUT-588) Benchmark Mahout's clustering performance on EC2 and publish the results

2011-10-13 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126482#comment-13126482 ] Grant Ingersoll commented on MAHOUT-588: I've turned off access to m

Re: where is the key in sequenceFile use seq2sparse

2011-10-12 Thread Grant Ingersoll
e directory pathname for output > points new CosineDistanceMeasure(), // cos 0.1d, // the convergence delta > value 10, // the maximum number of iterations true, // run clustering false > // execute map reduce ); > > > > > no exception thrown and thx in advance

Re: where is the key in sequenceFile use seq2sparse

2011-10-12 Thread Grant Ingersoll
e output i use tfidf-vectors/ > >step #3 #4 >canopy -> kmeans > >step #4 >clusterDump > >i found the vector is org.apache.mahout.math.RandomAccessSparseVector, > and where i can found the sequenceFile key?? > >

[jira] [Commented] (MAHOUT-839) rowid job failing (when parsing options)

2011-10-11 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125132#comment-13125132 ] Grant Ingersoll commented on MAHOUT-839: {quote} Map parsed

[jira] [Commented] (MAHOUT-839) rowid job failing (when parsing options)

2011-10-11 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125128#comment-13125128 ] Grant Ingersoll commented on MAHOUT-839: I didn't run the code, but look

[jira] [Commented] (MAHOUT-839) rowid job failing (when parsing options)

2011-10-11 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125108#comment-13125108 ] Grant Ingersoll commented on MAHOUT-839: Also, for future reference, no nee

<    3   4   5   6   7   8   9   10   11   12   >