[jira] [Commented] (MAHOUT-847) Improve Euclidean distance similarity calculation

2011-10-21 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133279#comment-13133279 ] Sean Owen commented on MAHOUT-847: -- No, EuclideanDistanceMeasure is a distance measure ra

[jira] [Issue Comment Edited] (MAHOUT-838) Make the confusion matrix writable to a file when testing classifiers

2011-10-21 Thread Lance Norskog (Issue Comment Edited) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133255#comment-13133255 ] Lance Norskog edited comment on MAHOUT-838 at 10/22/11 6:04 AM:

[jira] [Updated] (MAHOUT-838) Make the confusion matrix writable to a file when testing classifiers

2011-10-21 Thread Lance Norskog (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-838: - Attachment: MAHOUT-838_mini.patch > Make the confusion matrix writable to a file when testing

[jira] [Updated] (MAHOUT-838) Make the confusion matrix writable to a file when testing classifiers

2011-10-21 Thread Lance Norskog (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-838: - Attachment: MatrixWritable.java ConfusionMatrix.java Replace these two files, the

[jira] [Commented] (MAHOUT-838) Make the confusion matrix writable to a file when testing classifiers

2011-10-21 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133241#comment-13133241 ] Lance Norskog commented on MAHOUT-838: -- MAHOUT-812 was committed on Oct. 3. If your c

[jira] [Updated] (MAHOUT-849) Wrong error messages in AbstractMatrix

2011-10-21 Thread Lance Norskog (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-849: - Attachment: MAHOUT-849.patch > Wrong error messages in AbstractMatrix > -

[jira] [Created] (MAHOUT-849) Wrong error messages in AbstractMatrix

2011-10-21 Thread Lance Norskog (Created) (JIRA)
Wrong error messages in AbstractMatrix -- Key: MAHOUT-849 URL: https://issues.apache.org/jira/browse/MAHOUT-849 Project: Mahout Issue Type: Bug Reporter: Lance Norskog Priority: Tri

Re: Average distance between two points in unit hypercube?

2011-10-21 Thread Lance Norskog
More completely: given an Nx1 random projection matrix, project any N-dimensional vector to a 1-dimensional vector. The delta of two of these 1-d vectors gives a consistent difference, no? On Fri, Oct 21, 2011 at 9:35 AM, Ted Dunning wrote: > Well, it is missing a matrix to define the projection

[jira] [Commented] (MAHOUT-847) Improve Euclidean distance similarity calculation

2011-10-21 Thread Lance Norskog (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133234#comment-13133234 ] Lance Norskog commented on MAHOUT-847: -- How does this compare with EuclideanDistanceM

[jira] [Commented] (MAHOUT-845) Make cluster top terms code more reusable

2011-10-21 Thread Jake Mannix (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133178#comment-13133178 ] Jake Mannix commented on MAHOUT-845: Well, that's a good question. I've used it the s

[jira] [Commented] (MAHOUT-845) Make cluster top terms code more reusable

2011-10-21 Thread Frank Scholten (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133136#comment-13133136 ] Frank Scholten commented on MAHOUT-845: --- Yes and this method would return an array o

[jira] [Commented] (MAHOUT-845) Make cluster top terms code more reusable

2011-10-21 Thread Frank Scholten (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133132#comment-13133132 ] Frank Scholten commented on MAHOUT-845: --- Yes you make some good points about the vir

[jira] [Commented] (MAHOUT-845) Make cluster top terms code more reusable

2011-10-21 Thread Jake Mannix (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133130#comment-13133130 ] Jake Mannix commented on MAHOUT-845: So this is good, I like that this puts the queue

[jira] [Updated] (MAHOUT-845) Make cluster top terms code more reusable

2011-10-21 Thread Frank Scholten (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Scholten updated MAHOUT-845: -- Attachment: MAHOUT-845.patch Newer patch that also updates ClusterDumperWriter

[jira] [Commented] (MAHOUT-845) Make cluster top terms code more reusable

2011-10-21 Thread Jake Mannix (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133104#comment-13133104 ] Jake Mannix commented on MAHOUT-845: Ooh, actually, I have code which does this on my

[jira] [Updated] (MAHOUT-845) Make cluster top terms code more reusable

2011-10-21 Thread Frank Scholten (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Scholten updated MAHOUT-845: -- Attachment: MAHOUT-845.patch Here is a patch for retrieving the top k elements on Vector, imple

[jira] [Updated] (MAHOUT-845) Make cluster top terms code more reusable

2011-10-21 Thread Frank Scholten (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Scholten updated MAHOUT-845: -- Fix Version/s: 0.6 Affects Version/s: 0.5 Status: Patch Available (was:

Re: Are we moderating the user list?

2011-10-21 Thread Benson Margulies
I think he was confused. He sent two copies of a message just now that came through. On Fri, Oct 21, 2011 at 3:02 PM, Sean Owen wrote: > The Apache mail servers definitely moderate suspected spam, and, it > flags a whole lot of stuff. For example, mails from committers are > regularly flagged. >

Re: Are we moderating the user list?

2011-10-21 Thread Sean Owen
The Apache mail servers definitely moderate suspected spam, and, it flags a whole lot of stuff. For example, mails from committers are regularly flagged. I (and a few others) get moderate messages. Obviously I approve anything that's not spam, but, it's possible I missed something. I doubt I misse

Are we moderating the user list?

2011-10-21 Thread Benson Margulies
Someone on stackoverflow claims to have send mail to the user list that hasn't shown up.

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-10-21 Thread Jeff Eastman (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132870#comment-13132870 ] Jeff Eastman commented on MAHOUT-524: - I'm running in the Eclipse debugger, debugging

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-10-21 Thread Dan Brickley (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132867#comment-13132867 ] Dan Brickley commented on MAHOUT-524: - re Sean's "I'd restart your cluster."; should i

[jira] [Commented] (MAHOUT-672) Implementation of Conjugate Gradient for solving large linear systems

2011-10-21 Thread Dmitriy Lyubimov (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132817#comment-13132817 ] Dmitriy Lyubimov commented on MAHOUT-672: - Jonathan, Thank you for this work. It

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-10-21 Thread Jeff Eastman (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132810#comment-13132810 ] Jeff Eastman commented on MAHOUT-524: - All of this is buried inside of DistributedLanc

Re: Average distance between two points in unit hypercube?

2011-10-21 Thread Ted Dunning
Well, it is missing a matrix to define the projections, but it is the heart of the issue. It also lacks a proof which is kind of important among some circles. The clever proof is actually not that hard to grok. On Fri, Oct 21, 2011 at 5:07 AM, Federico Castanedo wrote: > I think, that's a good

Re: Average distance between two points in unit hypercube?

2011-10-21 Thread Federico Castanedo
I think, that's a good explanation of the Johnson-Lindenstrauss Lemma, which is the basis of the manifold learning theory using random projections. 2011/10/21 Ted Dunning > Sort of. > > I may be misunderstanding the question. > > If you take a random orthogonal projection, then distances will be

[jira] [Updated] (MAHOUT-847) Improve Euclidean distance similarity calculation

2011-10-21 Thread Sean Owen (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-847: - Resolution: Fixed Status: Resolved (was: Patch Available) > Improve Euclidean distance simil

[jira] [Commented] (MAHOUT-847) Improve Euclidean distance similarity calculation

2011-10-21 Thread Sean Owen (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132507#comment-13132507 ] Sean Owen commented on MAHOUT-847: -- The problem with caching sqrt(n) is that every pair o