[jira] [Updated] (MAHOUT-838) Make the confusion matrix writable to a file when testing classifiers

2011-10-22 Thread Lance Norskog (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-838: - Attachment: MatrixWritable.java ConfusionMatrix.java Replace these two files,

[jira] [Updated] (MAHOUT-838) Make the confusion matrix writable to a file when testing classifiers

2011-10-22 Thread Lance Norskog (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-838: - Attachment: MAHOUT-838_mini.patch Make the confusion matrix writable to a file when testing

[jira] [Issue Comment Edited] (MAHOUT-838) Make the confusion matrix writable to a file when testing classifiers

2011-10-22 Thread Lance Norskog (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133255#comment-13133255 ] Lance Norskog edited comment on MAHOUT-838 at 10/22/11 6:04 AM:

[jira] [Commented] (MAHOUT-847) Improve Euclidean distance similarity calculation

2011-10-22 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133279#comment-13133279 ] Sean Owen commented on MAHOUT-847: -- No, EuclideanDistanceMeasure is a distance measure

Re: Average distance between two points in unit hypercube?

2011-10-22 Thread Ted Dunning
I don't understand the question. What are the two vectors. Suppose you have x, an N dimensional vector and \Omega, a random 1xN projection. \Omega x is a 1-dimensional vector. Where is the second 1-d vector? On Fri, Oct 21, 2011 at 8:23 PM, Lance Norskog goks...@gmail.com wrote: More

Re: Average distance between two points in unit hypercube?

2011-10-22 Thread Federico Castanedo
IMHO, the confusion here comes from the point that there are some of random projections, following this scheme, that preserves all pairwise geodesic and euclidean distances between points on the original space into the new one with high prob. In fact, non-linear manifold learning algorithms aims

[jira] [Resolved] (MAHOUT-849) Wrong error messages in AbstractMatrix

2011-10-22 Thread Sean Owen (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-849. -- Resolution: Fixed Fix Version/s: 0.6 Assignee: Sean Owen Good eye. In fact there were

[jira] [Commented] (MAHOUT-845) Make cluster top terms code more reusable

2011-10-22 Thread Frank Scholten (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133347#comment-13133347 ] Frank Scholten commented on MAHOUT-845: --- Or we could add a getTerm(String[]

Re: Demoralized over JIRA state

2011-10-22 Thread Sean Owen
Bringing this to dev@, mid-thread, per Grant's suggestion. There was a brief and fruitful thread on private@ to discuss project governance, but the topic has shifted such that it's useful to just talk on dev@. If I may paraphrase: I expressed concern about the sprawl of code and algorithms, aging

[jira] [Commented] (MAHOUT-838) Make the confusion matrix writable to a file when testing classifiers

2011-10-22 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133355#comment-13133355 ] Sean Owen commented on MAHOUT-838: -- Lance I tried just this, but I still get compile

Re: Demoralized over JIRA state

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 2:19 PM, Sean Owen wrote: Bringing this to dev@, mid-thread, per Grant's suggestion. There was a brief and fruitful thread on private@ to discuss project governance, but the topic has shifted such that it's useful to just talk on dev@. If I may paraphrase: I expressed

[jira] [Commented] (MAHOUT-849) Wrong error messages in AbstractMatrix

2011-10-22 Thread Hudson (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133409#comment-13133409 ] Hudson commented on MAHOUT-849: --- Integrated in Mahout-Quality # (See

Re: Demoralized over JIRA state

2011-10-22 Thread Sean Owen
Thanks! good thread. On Sat, Oct 22, 2011 at 3:30 PM, Grant Ingersoll gsing...@apache.org wrote: 1. We aim for releases every 6 months or so 2. We make a best guess up front about what bug fixes will be in that release, but we also will, obviously, bring in other fixes as they are reported

Re: Demoralized over JIRA state

2011-10-22 Thread Benson Margulies
When the board looks at the health of a community, one of the questions it asks (or so I am told) is, 'Is the community responsive to requests for assistance?' Now, the board's bar here is quite low. I'm not trying to suggest for a moment that Mahout is in any danger of attracting unfavorable

Re: Demoralized over JIRA state

2011-10-22 Thread Benson Margulies
Drat: I wrote 'is necessarily a badge of shame' when I meant to write 'is not necessarily a badge of shame'.

[jira] [Commented] (MAHOUT-849) Wrong error messages in AbstractMatrix

2011-10-22 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133493#comment-13133493 ] Lance Norskog commented on MAHOUT-849: -- Error: wanted 43, received 43. This usually

Re: Average distance between two points in unit hypercube?

2011-10-22 Thread Ted Dunning
Non-linear learning does, however, provide much more power than linear schemes can. The great virtue of linear schemes is when you have massive dimensionality. On Sat, Oct 22, 2011 at 2:14 AM, Federico Castanedo castanedof...@gmail.com wrote: In fact, non-linear manifold learning algorithms

[jira] [Commented] (MAHOUT-849) Wrong error messages in AbstractMatrix

2011-10-22 Thread Ted Dunning (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133499#comment-13133499 ] Ted Dunning commented on MAHOUT-849: It usually indicates a bug in the test, though.

Re: Demoralized over JIRA state

2011-10-22 Thread Dmitriy Lyubimov
I feel like I am most closely aligned with Grant. Very little to add. Like it or not, Mahout is a library, not a coherent product such as hbase. It's a collection of algorithms connectied together with some fairly thin structure and persitence glue, but thenglue rarely can go much beyond that.

Re: Demoralized over JIRA state

2011-10-22 Thread Lance Norskog
The debate above seems pretty complete. What are positive actions that will make Mahout healthier? Suggestions from debate: * Automated patch testing. This would cure 'rotting patch' problem. * Chivvying contributors for detailed notes. * ? Personal concepts: * Regression suite with real data.

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-10-22 Thread Shannon Quinn (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133524#comment-13133524 ] Shannon Quinn commented on MAHOUT-524: -- If there are two DLS.runJob() methods and the

Re: Average distance between two points in unit hypercube?

2011-10-22 Thread Lance Norskog
No, I have two n-d vectors and want to measure their distance. Suppose I project both via the same 1xN random projection matrix, and then find the delta of the two values. Is this a valid distance? An approximate Manhattan distance? On Sat, Oct 22, 2011 at 1:46 AM, Ted Dunning

[jira] [Commented] (MAHOUT-849) Wrong error messages in AbstractMatrix

2011-10-22 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133566#comment-13133566 ] Lance Norskog commented on MAHOUT-849: -- My code attempted to multiply two MxN

Re: Average distance between two points in unit hypercube?

2011-10-22 Thread Ted Dunning
It is a valid distance, but it will not necessarily accurately reflect the original distance. You need to preserve roughly log N dimensions to get something reasonably accurate relative to the original distance (with high probability). To the extent that your original vectors are colinear with

Re: Demoralized over JIRA state

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 6:41 PM, Sean Owen wrote: Thanks! good thread. On Sat, Oct 22, 2011 at 3:30 PM, Grant Ingersoll gsing...@apache.org wrote: 1. We aim for releases every 6 months or so 2. We make a best guess up front about what bug fixes will be in that release, but we also will,

Re: Demoralized over JIRA state

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 7:34 PM, Benson Margulies wrote: When the board looks at the health of a community, one of the questions it asks (or so I am told) is, 'Is the community responsive to requests for assistance?' I think we are, but of course we could be better. Now, the board's bar here