[jira] [Commented] (MAHOUT-155) ARFF VectorIterable

2011-11-03 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13143095#comment-13143095 ] Grant Ingersoll commented on MAHOUT-155: Joe, bq. 1. TODO: create a map so we

Re: Use of Preconditions in Mahalanobis and other places

2011-11-03 Thread Grant Ingersoll
at 4:11 PM, Grant Ingersoll gsing...@apache.org wrote: I was looking at some of the DistanceMeasure stuff (Mahalanobis at the moment) and it strikes me as a bit odd that our core distance() functions would use Preconditions to check things that are part of construction/configuration

Re: Dirchlet

2011-11-03 Thread Grant Ingersoll
On Nov 2, 2011, at 5:31 PM, Ted Dunning wrote: For some kinds of models, notably all of the ones from the exponential class, there exist sufficient statistics and the combination of models really is a lot like addition. Most of the uses of DP clustering involve exponential models like

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-03 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13143512#comment-13143512 ] Grant Ingersoll commented on MAHOUT-524: bq. If at all possible, my suggestion

[jira] [Created] (MAHOUT-868) Rename build*.sh examples to be more indicative of what they actually do, i.e. classify-20newsgroups.sh

2011-11-03 Thread Grant Ingersoll (Created) (JIRA)
/browse/MAHOUT-868 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 0.6 The build*.sh scripts in examples/bin are a bit weird naming wise. We should deprecate

[jira] [Created] (MAHOUT-869) driver.classes.props is getting unwieldy

2011-11-03 Thread Grant Ingersoll (Created) (JIRA)
driver.classes.props is getting unwieldy Key: MAHOUT-869 URL: https://issues.apache.org/jira/browse/MAHOUT-869 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll

Re: Dirchlet

2011-11-03 Thread Grant Ingersoll
, Grant Ingersoll gsing...@apache.org wrote: On Nov 2, 2011, at 5:31 PM, Ted Dunning wrote: For some kinds of models, notably all of the ones from the exponential class, there exist sufficient statistics and the combination of models really is a lot like addition. Most of the uses of DP

Bayes is dead. Long live Naive Bayes

2011-11-03 Thread Grant Ingersoll
Well, maybe not dead... What's our goal for the two implementations of Naive Bayes (and Complementary)? It seems to me like the old one, o.a.m.classifier.bayes, is intended to be deprecated due to the fact that it is tied to a word based representation. However, it seems to still have a few

[jira] [Created] (MAHOUT-870) Driver or Job? Let's pick one and be consistent.

2011-11-03 Thread Grant Ingersoll (Created) (JIRA)
Driver or Job? Let's pick one and be consistent. - Key: MAHOUT-870 URL: https://issues.apache.org/jira/browse/MAHOUT-870 Project: Mahout Issue Type: Improvement Reporter: Grant

Re: Reviewboard?

2011-11-02 Thread Grant Ingersoll
-- Grant Ingersoll http://www.lucidimagination.com

Re: integration tests

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 5:13 AM, Jake Mannix wrote: So in the process of getting the LDA improvements I've got brewing over on GitHub, and I'm doing my good due diligence and making more unit tests and so forth, and I'm trying to figure out the best way to unit test something like this, and I

[jira] [Created] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
MurmurHash 3.0 -- Key: MAHOUT-862 URL: https://issues.apache.org/jira/browse/MAHOUT-862 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority

Re: Goodbye graph algorithms

2011-11-02 Thread Grant Ingersoll
with various forms of graph-shaped data, but isn't a general-purpose graph processing environment? Dan Grant Ingersoll http://www.lucidimagination.com

[jira] [Resolved] (MAHOUT-859) Move Decision Forests to classifier package

2011-11-02 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-859. Resolution: Fixed Committed revision 1196578. Move Decision Forests

[jira] [Updated] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-862: --- Attachment: MAHOUT-862.patch Here's a patch that adds MurmurHash3. Tests pass, but I'm

[jira] [Commented] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142199#comment-13142199 ] Grant Ingersoll commented on MAHOUT-862: I accidentally committed this when making

[jira] [Commented] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142200#comment-13142200 ] Grant Ingersoll commented on MAHOUT-862: Committed revision 1196616. I'll leave

[jira] [Created] (MAHOUT-863) Add DisplayMinhash clustering example

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
Add DisplayMinhash clustering example - Key: MAHOUT-863 URL: https://issues.apache.org/jira/browse/MAHOUT-863 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll

[jira] [Updated] (MAHOUT-864) DisplayCanopy doesn't show any clusters

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-864: --- Component/s: Examples Clustering DisplayCanopy doesn't show any

[jira] [Created] (MAHOUT-864) DisplayCanopy doesn't show any clusters

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
DisplayCanopy doesn't show any clusters --- Key: MAHOUT-864 URL: https://issues.apache.org/jira/browse/MAHOUT-864 Project: Mahout Issue Type: Bug Reporter: Grant Ingersoll Priority

[jira] [Commented] (MAHOUT-864) DisplayCanopy doesn't show any clusters

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142212#comment-13142212 ] Grant Ingersoll commented on MAHOUT-864: Appears to be due to the fact

[jira] [Resolved] (MAHOUT-864) DisplayCanopy doesn't show any clusters

2011-11-02 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-864. Resolution: Fixed Fix Version/s: 0.6 Assignee: Grant Ingersoll

Re: Canopy and other clustering approaches

2011-11-02 Thread Grant Ingersoll
, Grant Ingersoll wrote: In reviewing clustering for upcoming training, I'm wondering about something w/ Canopy clustering that we claim, but wanted to check here first. In the lectures, etc. I've seen on it, the idea is to run Canopy first and then some other more expensive algorithm, such as k

Re: Goodbye graph algorithms

2011-11-02 Thread Grant Ingersoll
WFM - works for me. On Nov 2, 2011, at 11:30 AM, Sebastian Schelter wrote: On 02.11.2011 16:04, Jake Mannix wrote: On Wed, Nov 2, 2011 at 6:38 AM, Grant Ingersoll gsing...@apache.org wrote: Perhaps it would make sense to move them to a branch? I know we never released them, but it seems

[jira] [Commented] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142324#comment-13142324 ] Grant Ingersoll commented on MAHOUT-862: I committed the test

Re: integration tests

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 1:01 PM, Jake Mannix wrote: On Wed, Nov 2, 2011 at 5:36 AM, Grant Ingersoll gsing...@apache.org wrote: Alternatively, the ASF email data is license free. We could take and use a chunk of that. You can pretty much have as much or as little as you want. Since it's

[jira] [Updated] (MAHOUT-863) Add DisplayMinhash clustering example

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-863: --- Attachment: MAHOUT-863.patch Here's a start. It doesn't display the items yet, namely

[jira] [Created] (MAHOUT-865) Refactor Sequential Clustering algorithms

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
Refactor Sequential Clustering algorithms - Key: MAHOUT-865 URL: https://issues.apache.org/jira/browse/MAHOUT-865 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll

[jira] [Updated] (MAHOUT-862) MurmurHash 3.0

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-862: --- Fix Version/s: 0.6 MurmurHash 3.0 -- Key: MAHOUT-862

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142461#comment-13142461 ] Grant Ingersoll commented on MAHOUT-524: bq. Is there any way we could simplify

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142477#comment-13142477 ] Grant Ingersoll commented on MAHOUT-524: Tracing into the Hadoop code, this data

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142510#comment-13142510 ] Grant Ingersoll commented on MAHOUT-524: REalizing now that Jeff already said

Dirchlet

2011-11-02 Thread Grant Ingersoll
Tim Potter and I have tried running Dirchlet in the past on the ASF email set on EC2 and it didn't seem to scale all that well, so I was wondering if people had ideas on improving it's speed. One question I had is whether we could inject a Combiner into the process? Ted also mentioned that

[jira] [Updated] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-524: --- Attachment: MAHOUT-524.patch patch so far, never mind the DisplayMinHash stuff, as I forgot

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142557#comment-13142557 ] Grant Ingersoll commented on MAHOUT-524: The NPE is from one of the rowJ values

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142586#comment-13142586 ] Grant Ingersoll commented on MAHOUT-524: I guess the 1100 comes from how we

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142583#comment-13142583 ] Grant Ingersoll commented on MAHOUT-524: in this particular case, the state has 4

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142588#comment-13142588 ] Grant Ingersoll commented on MAHOUT-524: Seems the numDims == 1100

[jira] [Updated] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-524: --- Attachment: MAHOUT-524.patch This gets past the Lanczos issue by checking the size. __ I

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142633#comment-13142633 ] Grant Ingersoll commented on MAHOUT-524: bq. I applied your patch but I'm having

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-11-02 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142635#comment-13142635 ] Grant Ingersoll commented on MAHOUT-524: bq. I applied your patch but I'm having

Use of Preconditions in Mahalanobis and other places

2011-11-02 Thread Grant Ingersoll
I was looking at some of the DistanceMeasure stuff (Mahalanobis at the moment) and it strikes me as a bit odd that our core distance() functions would use Preconditions to check things that are part of construction/configuration of the instance. A call to distance() is likely executed a lot of

Re: Dirchlet

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 6:05 PM, Ted Dunning wrote: I have done some testing and have been unable to demonstrate a big difference in allocating versus re-using. Re-using is, however, *really* error prone. I've been bitten by that one at least once. It's a pain to debug.

Re: Dirchlet

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 5:29 PM, Jeff Eastman wrote: I think the scalability problems you are seeing are a consequence of using the default GaussianCluster models. These models perform especially poorly for large text clustering problems such as email. The pdf() calculation over wide topic

[jira] [Assigned] (MAHOUT-866) Move Precondition checks out of Mahalanobis.distance method and into configuration/setup

2011-11-02 Thread Grant Ingersoll (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-866: -- Assignee: Grant Ingersoll Move Precondition checks out of Mahalanobis.distance

[jira] [Resolved] (MAHOUT-866) Move Precondition checks out of Mahalanobis.distance method and into configuration/setup

2011-11-02 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-866. Resolution: Fixed Fix Version/s: 0.6 Move Precondition checks out

Re: Dirchlet

2011-11-02 Thread Grant Ingersoll
2, 2011 at 10:13 PM, Grant Ingersoll gsing...@apache.org wrote: Tim Potter and I have tried running Dirchlet in the past on the ASF email set on EC2 and it didn't seem to scale all that well, so I was wondering if people had ideas on improving it's speed. One question I had is whether we

[jira] [Created] (MAHOUT-867) Add ClusterEvaluator capabilities to ClusterDumper

2011-11-02 Thread Grant Ingersoll (Created) (JIRA)
Add ClusterEvaluator capabilities to ClusterDumper -- Key: MAHOUT-867 URL: https://issues.apache.org/jira/browse/MAHOUT-867 Project: Mahout Issue Type: Improvement Reporter: Grant

[jira] [Assigned] (MAHOUT-867) Add ClusterEvaluator capabilities to ClusterDumper

2011-11-02 Thread Grant Ingersoll (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-867: -- Assignee: Grant Ingersoll Add ClusterEvaluator capabilities to ClusterDumper

[jira] [Updated] (MAHOUT-867) Add ClusterEvaluator capabilities to ClusterDumper

2011-11-02 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-867: --- Attachment: MAHOUT-867.patch Adds --evaluate option to ClusterDumper, which then uses

[jira] [Resolved] (MAHOUT-867) Add ClusterEvaluator capabilities to ClusterDumper

2011-11-02 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-867. Resolution: Fixed Hooked it into ClusterDumper, also hooked it into build-reuters.sh for k

Re: Towards 1.0 - Defining backwards compatibility guarantees

2011-11-01 Thread Grant Ingersoll
? Most likely I've forgotten about other vital pieces - just wanted to kick off that discussion. Isabel * though not the only one - others include but are not limited to the time frame for which we offer support for any given release. Grant

[jira] [Commented] (MAHOUT-155) ARFF VectorIterable

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141099#comment-13141099 ] Grant Ingersoll commented on MAHOUT-155: Hey Joe, Since these are categorical

Re: Towards 1.0 - Defining backwards compatibility guarantees

2011-11-01 Thread Grant Ingersoll
On Nov 1, 2011, at 8:09 AM, Grant Ingersoll wrote: FWIW, in Lucene, we do the following: 1. All minor versions within a major release can read prior versions index within the same major release. That is, 3.4 can read a 3.3 index. However, 3.3 cannot read a 3.4 index. When a user reads

[jira] [Created] (MAHOUT-856) build-20news-bayes.sh doesn't work when downloading content

2011-11-01 Thread Grant Ingersoll (Created) (JIRA)
Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor The build-20news-bayes.sh script doesn't work when downloading the content for the first time. The issue is that it changes the directory to the temp directory and then later tries to do cd ../.. to get back

[jira] [Resolved] (MAHOUT-856) build-20news-bayes.sh doesn't work when downloading content

2011-11-01 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-856. Resolution: Fixed Fix Version/s: 0.6 build-20news-bayes.sh doesn't work when

[jira] [Created] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Created) (JIRA)
Reporter: Grant Ingersoll We have build-20news-bayes.sh that runs our NB stuff on 20 news groups. We also have an SGD example that works on 20 news groups, but no script to run it. I'm going to rename build-20news-bayes.sh to classify-20news.sh and incorporate the two

Re: Towards 1.0 - Defining backwards compatibility guarantees

2011-11-01 Thread Grant Ingersoll
On Nov 1, 2011, at 12:15 PM, Ted Dunning wrote: I think the trend is away from an explicit version in serialized data and toward systems like protobufs or avro which allow much more flexibility. +1 Sent from my iPhone On Nov 1, 2011, at 5:09, Grant Ingersoll gsing...@apache.org wrote

[jira] [Commented] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141321#comment-13141321 ] Grant Ingersoll commented on MAHOUT-857: Here's the conf. matrix I'm getting

Train/TestNewsGroups with SGD

2011-11-01 Thread Grant Ingersoll
I'm working on https://issues.apache.org/jira/browse/MAHOUT-857. Each time I run it, I get different answers for SGD for the confusion matrix, which is presumably due to the randomness built in. However, is there a way to set the seed so one can reproduce results for actually testing the

[jira] [Updated] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-857: --- Attachment: MAHOUT-857.patch Much better looking patch. Cleaned up the code, dropped

[jira] [Commented] (MAHOUT-857) Rework 20 NewsGroup shell script example to include SGD Example

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141464#comment-13141464 ] Grant Ingersoll commented on MAHOUT-857: I committed the last patch, plus some

Re: Train/TestNewsGroups with SGD

2011-11-01 Thread Grant Ingersoll
On Nov 1, 2011, at 2:45 PM, Sean Owen wrote: RandomUtils.setTestSeed() (or something like that) makes all the RNGs deterministic -- well if they are using RandomUtils. I see it in use in at least one place. On Tue, Nov 1, 2011 at 6:20 PM, Grant Ingersoll gsing...@apache.org wrote: I'm

Random Forests

2011-11-01 Thread Grant Ingersoll
Anyone object to me moving the Decision/Random Forest stuff into the classifiers package? Seems like that is where it rightfully belongs. -Grant

[jira] [Created] (MAHOUT-859) Move Decision Forests to classifier package

2011-11-01 Thread Grant Ingersoll (Created) (JIRA)
Move Decision Forests to classifier package --- Key: MAHOUT-859 URL: https://issues.apache.org/jira/browse/MAHOUT-859 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll

Canopy and other clustering approaches

2011-11-01 Thread Grant Ingersoll
In reviewing clustering for upcoming training, I'm wondering about something w/ Canopy clustering that we claim, but wanted to check here first. In the lectures, etc. I've seen on it, the idea is to run Canopy first and then some other more expensive algorithm, such as k-means, etc. with the

[jira] [Assigned] (MAHOUT-854) Add MinHash to build-reuters.sh example

2011-11-01 Thread Grant Ingersoll (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-854: -- Assignee: Grant Ingersoll Add MinHash to build-reuters.sh example

[jira] [Commented] (MAHOUT-344) Minhash based clustering

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141892#comment-13141892 ] Grant Ingersoll commented on MAHOUT-344: Ankur, any luck on documenting this stuff

[jira] [Commented] (MAHOUT-854) Add MinHash to build-reuters.sh example

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141894#comment-13141894 ] Grant Ingersoll commented on MAHOUT-854: bq. 1. Is it just me or when I try

Re: [jira] [Created] (MAHOUT-860) Create minimalist maven module for *Writable classes for export

2011-11-01 Thread Grant Ingersoll
On Nov 2, 2011, at 12:00 AM, Jake Mannix wrote: Anyone with mad maven skills know how to churn that out in a short evenings-worth of work? :) Is there such a thing? :-) As an alternative, we could simply generate a Jar that contains just the necessary files and no re-org is necessary.

[jira] [Commented] (MAHOUT-854) Add MinHash to build-reuters.sh example

2011-11-01 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141900#comment-13141900 ] Grant Ingersoll commented on MAHOUT-854: I've committed this, but will leave

[jira] [Commented] (MAHOUT-627) Baum-Welch Algorithm on Map-Reduce for Parallel Hidden Markov Model Training.

2011-10-31 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140149#comment-13140149 ] Grant Ingersoll commented on MAHOUT-627: I'm going to look to commit this soon

[jira] [Created] (MAHOUT-855) LuceneTextValueEncoder doesn't properly set internal buffers, causing BufferUnderflowException

2011-10-31 Thread Grant Ingersoll (Created) (JIRA)
Project: Mahout Issue Type: Bug Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 0.6 The LuceneTextValueEncoder throws an BufferUnderflowException when used. See the code below. The problem appears

[jira] [Commented] (MAHOUT-855) LuceneTextValueEncoder doesn't properly set internal buffers, causing BufferUnderflowException

2011-10-31 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140234#comment-13140234 ] Grant Ingersoll commented on MAHOUT-855: At least two issues here: 1

[jira] [Updated] (MAHOUT-855) LuceneTextValueEncoder doesn't properly set internal buffers, causing BufferUnderflowException

2011-10-31 Thread Grant Ingersoll (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-855: --- Attachment: MAHOUT-855.patch Here's a fix, going to commit shortly

[jira] [Resolved] (MAHOUT-855) LuceneTextValueEncoder doesn't properly set internal buffers, causing BufferUnderflowException

2011-10-31 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-855. Resolution: Fixed Committed revision 1195549. LuceneTextValueEncoder

Re: Patch : Formatting

2011-10-31 Thread Grant Ingersoll
, Paritosh /Manuel Grant Ingersoll http://www.lucidimagination.com

[jira] [Created] (MAHOUT-852) Upgrade Lucene dependency to 3.4

2011-10-26 Thread Grant Ingersoll (Created) (JIRA)
Upgrade Lucene dependency to 3.4 Key: MAHOUT-852 URL: https://issues.apache.org/jira/browse/MAHOUT-852 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll Assignee

[jira] [Resolved] (MAHOUT-852) Upgrade Lucene dependency to 3.4

2011-10-26 Thread Grant Ingersoll (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-852. Resolution: Fixed Upgrade Lucene dependency to 3.4

[jira] [Created] (MAHOUT-851) Add SGD to build-asf-email.sh example

2011-10-25 Thread Grant Ingersoll (Created) (JIRA)
Add SGD to build-asf-email.sh example - Key: MAHOUT-851 URL: https://issues.apache.org/jira/browse/MAHOUT-851 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll

Re: Demoralized over JIRA state

2011-10-25 Thread Grant Ingersoll
changes *that really should happen in a next release*, 0.6. Then file some JIRAs for additional things that can and should be done in the next month or so. +1 On Mon, Oct 24, 2011 at 7:40 PM, Grant Ingersoll gsing...@apache.org wrote: My first thought was what's the difference between open

Re: autoexported sites 'to be phased out by Nov 2011'

2011-10-25 Thread Grant Ingersoll
On Oct 25, 2011, at 11:04 AM, Isabel Drost wrote: On 25.10.2011 Dan Brickley wrote: These make clear the urgency; the auto exporter is unmaintained, and breaks with Confluence updates. Ok - so the bottom line is: Auto export will go away. Confluence will remain. As linking to dynamic

Re: Demoralized over JIRA state

2011-10-24 Thread Grant Ingersoll
-- Grant Ingersoll http://www.lucidimagination.com

Re: Demoralized over JIRA state

2011-10-24 Thread Grant Ingersoll
On Oct 23, 2011, at 6:29 AM, Dan Brickley wrote: [snip] Interesting discussion, and maybe a good time for those of us making use of all this code to remember to say 'thanks'. So, er yeah, thanks. One thing I would like to bring up, as you talk this stuff through, is that there are a few

Re: Demoralized over JIRA state

2011-10-24 Thread Grant Ingersoll
' target and let it live there? On Mon, Oct 24, 2011 at 9:59 AM, Jake Mannix jake.man...@gmail.com wrote: On Mon, Oct 24, 2011 at 5:25 AM, Grant Ingersoll gsing...@apache.orgwrote: - Anything that isn't fixed by December is WontFix and we release 0.6. I realize it's drastic, but it's

Re: Demoralized over JIRA state

2011-10-23 Thread Grant Ingersoll
The only issue I am really concerned about w provenance is pull requests from non ASF people that are brought in. Sometimes hard to track On Oct 23, 2011, at 7:56 AM, Benson Margulies bimargul...@gmail.com wrote: I just want to focus on the provenance question, but, really, you can ignore me.

Re: Demoralized over JIRA state

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 2:19 PM, Sean Owen wrote: Bringing this to dev@, mid-thread, per Grant's suggestion. There was a brief and fruitful thread on private@ to discuss project governance, but the topic has shifted such that it's useful to just talk on dev@. If I may paraphrase: I expressed

Re: Demoralized over JIRA state

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 6:41 PM, Sean Owen wrote: Thanks! good thread. On Sat, Oct 22, 2011 at 3:30 PM, Grant Ingersoll gsing...@apache.org wrote: 1. We aim for releases every 6 months or so 2. We make a best guess up front about what bug fixes will be in that release, but we also

Re: Demoralized over JIRA state

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 7:34 PM, Benson Margulies wrote: When the board looks at the health of a community, one of the questions it asks (or so I am told) is, 'Is the community responsive to requests for assistance?' I think we are, but of course we could be better. Now, the board's bar here

[jira] [Reopened] (MAHOUT-698) Hook up Automated Patch Checking for Mahout

2011-10-15 Thread Grant Ingersoll (Reopened) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reopened MAHOUT-698: I'd say we leave this one open. When done right, it can help people get feedback right away

Re: [REPORT] Apache Mahout

2011-10-14 Thread Grant Ingersoll
+1 to that, or, alternatively, we should simply say Mahout is in a number of bundles at this point and we believe all players are properly following ASF branding guidelines. We will continue to monitor. On Oct 14, 2011, at 12:51 PM, Ted Dunning wrote: It might, for equity, be reasonable to

Re: [REPORT] Apache Mahout

2011-10-14 Thread Grant Ingersoll
On Oct 14, 2011, at 1:38 PM, Ted Dunning wrote: Which others are there? Maybe we should mention them all in this report. 2 is a number of bundles to me :-)

[jira] [Commented] (MAHOUT-588) Benchmark Mahout's clustering performance on EC2 and publish the results

2011-10-13 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13126482#comment-13126482 ] Grant Ingersoll commented on MAHOUT-588: I've turned off access to mine. You

Re: where is the key in sequenceFile use seq2sparse

2011-10-12 Thread Grant Ingersoll
- kmeans step #4 clusterDump i found the vector is org.apache.mahout.math.RandomAccessSparseVector, and where i can found the sequenceFile key?? thx in advance Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011

Re: where is the key in sequenceFile use seq2sparse

2011-10-12 Thread Grant Ingersoll
, // the convergence delta value 10, // the maximum number of iterations true, // run clustering false // execute map reduce ); no exception thrown and thx in advance At 2011-10-12 20:27:19,Grant Ingersoll gsing...@apache.org wrote: Can you share your actual commands? On Oct 12

Re: Overtraining effects in NB

2011-10-11 Thread Grant Ingersoll
. On Mon, Oct 10, 2011 at 11:20 PM, Grant Ingersoll gsing...@apache.orgwrote: I was trying the Naive Bayes classifier via the build-asf-email.sh file I committed the other day on a data set that had a fairly significant variation in the number of messages per training label and am noticing

Re: MAHOUT-232 status?

2011-10-11 Thread Grant Ingersoll
On Oct 10, 2011, at 10:26 PM, Dmitriy Lyubimov wrote: As well as lda improvements. Gosh, Nudge, nudge, Jake!

Re: MAHOUT-232 status?

2011-10-11 Thread Grant Ingersoll
, or during Apache Con (Mon and Tue are Hackathon days there.) Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com

[jira] [Commented] (MAHOUT-839) rowid job failing (when parsing options)

2011-10-11 Thread Grant Ingersoll (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13125107#comment-13125107 ] Grant Ingersoll commented on MAHOUT-839: Hey Dan, I think the addInputOption

[jira] [Assigned] (MAHOUT-839) rowid job failing (when parsing options)

2011-10-11 Thread Grant Ingersoll (Assigned) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-839: -- Assignee: Grant Ingersoll rowid job failing (when parsing options

<    2   3   4   5   6   7   8   9   10   11   >