Re: [jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-16 Thread Ted Dunning
Shashikant, How does k-means treat your data? On Tue, Jun 16, 2009 at 10:12 PM, Shashikant Kore (JIRA) wrote: > This is voodoo for me. For the dataset I am working with has a window of > 0.05 in which the result changes from 0 canopies to 3,000 canopies. >

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-16 Thread Shashikant Kore (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720498#action_12720498 ] Shashikant Kore commented on MAHOUT-121: Grant, I am trying out the wikipedia vec

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720473#action_12720473 ] Grant Ingersoll commented on MAHOUT-121: Patch, of course, also needs a unit test f

[jira] Updated: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-121: --- Attachment: MAHOUT-121.patch Updated to trunk. Did some profiling and the bottleneck is gone

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720470#action_12720470 ] Grant Ingersoll commented on MAHOUT-121: Sparse Vector available at http://people.a

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720469#action_12720469 ] Grant Ingersoll commented on MAHOUT-121: I've created some vectors from Wikipedia a

[jira] Updated: (MAHOUT-126) Prepare document vectors from the text

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-126: --- Attachment: MAHOUT-126.patch Updated patch since MAHOUT-65-name.patch was committed. > Prepa

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720345#action_12720345 ] Grant Ingersoll commented on MAHOUT-65: --- Committed the name stuff: Committed revision

[jira] Updated: (MAHOUT-123) Implement Latent Dirichlet Allocation

2009-06-16 Thread David Hall (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Hall updated MAHOUT-123: -- Fix Version/s: 0.2 Affects Version/s: 0.2 Status: Patch Available (was: Open) T

[jira] Updated: (MAHOUT-123) Implement Latent Dirichlet Allocation

2009-06-16 Thread David Hall (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Hall updated MAHOUT-123: -- Attachment: lda.patch > Implement Latent Dirichlet Allocation > - >

[jira] Updated: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-65: -- Attachment: MAHOUT-65-name.patch How about a version where the tests actually pass? Will commit

Re: MAHOUT-65

2009-06-16 Thread Jeff Eastman
+1, you added name constructors that I didn't have and the equals/equivalent stuff. Ya, Gson makes it all pretty trivial once you grok it. Grant Ingersoll wrote: Shall I take that as approval of the approach? BTW, the Gson stuff seems like a winner for serialization. On Jun 16, 2009, at 3:5

Re: MAHOUT-65

2009-06-16 Thread Grant Ingersoll
Shall I take that as approval of the approach? BTW, the Gson stuff seems like a winner for serialization. On Jun 16, 2009, at 3:56 PM, Jeff Eastman wrote: You gonna commit your patch? I agree with shortening the class name in the JsonVectorAdapter and will do it once you commit ur stuff. Jef

[jira] Assigned: (MAHOUT-123) Implement Latent Dirichlet Allocation

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned MAHOUT-123: -- Assignee: Grant Ingersoll > Implement Latent Dirichlet Allocation > ---

MAHOUT-65

2009-06-16 Thread Jeff Eastman
You gonna commit your patch? I agree with shortening the class name in the JsonVectorAdapter and will do it once you commit ur stuff. Jeff PGP.sig Description: PGP signature

Re: Questions about Vector

2009-06-16 Thread Sean Owen
OK I have ready the changes described in this thread, but I didn't go ahead with removing "quick" methods. It proved to be a very large change, and, there were some situations where it looks like we need to think a little harder about what to do with the resulting code; it's not really just a matte

[jira] Updated: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-65: -- Attachment: MAHOUT-65-name.patch implement hashCode better, require equals and hashcode as part

[jira] Updated: (MAHOUT-126) Prepare document vectors from the text

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-126: --- Attachment: MAHOUT-126.patch Here's a version that is brought up to trunk and adds in MAHOUT-

[jira] Updated: (MAHOUT-126) Prepare document vectors from the text

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-126: --- Fix Version/s: 0.2 Affects Version/s: 0.2 Status: Patch Available (was

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720302#action_12720302 ] Grant Ingersoll commented on MAHOUT-65: --- Jeff, One comment on the GSON serialization

[jira] Updated: (MAHOUT-131) Vector improvements

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-131: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Vector improvements > --

[jira] Updated: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-65: -- Attachment: MAHOUT-65-name.patch Add name attribute. Also added some docs on equals and added a

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-16 Thread Shashikant Kore (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720246#action_12720246 ] Shashikant Kore commented on MAHOUT-121: I have document vectors created from some

[jira] Issue Comment Edited: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720199#action_12720199 ] Grant Ingersoll edited comment on MAHOUT-65 at 6/16/09 9:24 AM: --

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720218#action_12720218 ] Grant Ingersoll commented on MAHOUT-65: --- Yep, I have a patch for that, will upload sho

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720216#action_12720216 ] Jeff Eastman commented on MAHOUT-65: How about we just add a name attribute to AbstractV

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720199#action_12720199 ] Grant Ingersoll commented on MAHOUT-65: --- That works for Matrix. For Vector, I was thi

[jira] Updated: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Eastman updated MAHOUT-65: --- Attachment: MAHOUT-65d.patch Naming a Vector and having that be stateful - as opposed to bindings whic

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720161#action_12720161 ] Grant Ingersoll commented on MAHOUT-65: --- OK, I will work up a patch for the name thing

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-16 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720159#action_12720159 ] Sean Owen commented on MAHOUT-121: -- While you are at it, what are you running to load test

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Jeff Eastman (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720128#action_12720128 ] Jeff Eastman commented on MAHOUT-65: I committed MAHOUT-65c and another patch to make bi

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-16 Thread Shashikant Kore (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720123#action_12720123 ] Shashikant Kore commented on MAHOUT-121: Apologies for posting incorrect results in

[jira] Resolved: (MAHOUT-134) [PATCH] Cluster decode error handling

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-134. Resolution: Fixed Fix Version/s: 0.2 Committed revision 785197. > [PATCH] Cluster d

Re: Pregel - large.scale graph computation

2009-06-16 Thread Grant Ingersoll
Lukas, This is very cool. I've long had an interest in graph stuff and not just for PageRank, either. Turns out you can do some fun NLP things with graphs (http://www.textgraphs.org/ws09/index.html). Sounds like a nice thing to add to Hadoop or Mahout once more details are published.

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720045#action_12720045 ] Grant Ingersoll commented on MAHOUT-65: --- Is the only way to add bindings by setting th

[jira] Commented: (MAHOUT-65) Add Element Labels to Vectors and Matrices

2009-06-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720037#action_12720037 ] Grant Ingersoll commented on MAHOUT-65: --- Hey Jeff, Minor request, it seems like you h