Re: Labels for Vectors and Matrices (MAHOUT-65)

2009-06-09 Thread Ted Dunning
I would say that a real user gets a bigger vote relative to theoretical complainers. Simple at first is fine by me. On Tue, Jun 9, 2009 at 4:09 PM, Jeff Eastman wrote: > IIRC, the patch in M-65 works but was judged to be inadequate so I never > committed it. After several subsequent postings on

Re: Mahout's Dirichlet Process Mixture Model implementation

2009-06-09 Thread Jeff Eastman
I had a nice Skype chat with Sebastien last week and he has my wholehearted support on these improvements. The sampleFromPosterior implementation was a total naive hack which was actually an improvement over always sampling from the prior. The fact that the current implementation converges so n

Re: Labels for Vectors and Matrices (MAHOUT-65)

2009-06-09 Thread Jeff Eastman
IIRC, the patch in M-65 works but was judged to be inadequate so I never committed it. After several subsequent postings on the requirements I started another version but it ran into impossible serialization/deserialization issues. Those could probably be addressed now with Json but the patch i

Re: A Bunch of Vector questions

2009-06-09 Thread Grant Ingersoll
On Jun 9, 2009, at 5:49 PM, Grant Ingersoll wrote: I'm looking into the whole labels thing as well as Vector stuff and I'm confused by a couple of things. 1. DirchletMapper assumes DenseVector implementation, no? Line 45? 2. Shouldn't DenseVector implement equals like SparseVector does?

Re: Labels for Vectors and Matrices (MAHOUT-65)

2009-06-09 Thread Grant Ingersoll
FWIW, I'm happy w/ a simple solution right now, which may very well be Jeff's initial patch. Still, I'd like to hear more from Ted, Jeff and Karl. On Jun 9, 2009, at 6:20 PM, Benson Margulies wrote: OK, I came in to the middle and misunderstood. This doesn't precisely seem to leave much o

Re: Labels for Vectors and Matrices (MAHOUT-65)

2009-06-09 Thread Benson Margulies
OK, I came in to the middle and misunderstood. This doesn't precisely seem to leave much of a space for another person to join in, but I'd be happy to be corrected by some combination of the people cited below. On Tue, Jun 9, 2009 at 6:18 PM, Grant Ingersoll wrote: > > On Jun 9, 2009, at 6:11 PM

Re: Labels for Vectors and Matrices (MAHOUT-65)

2009-06-09 Thread Grant Ingersoll
On Jun 9, 2009, at 6:11 PM, Benson Margulies wrote: Grant, AFAIK, there's a perfectly adequate version sitting out there in MAHOUT-65, waiting for a committer to commit it. If that's wrong and there's a concrete coding task I could undertake that would render it committable, I'd be game. T

Re: Labels for Vectors and Matrices (MAHOUT-65)

2009-06-09 Thread Benson Margulies
Grant, AFAIK, there's a perfectly adequate version sitting out there in MAHOUT-65, waiting for a committer to commit it. If that's wrong and there's a concrete coding task I could undertake that would render it committable, I'd be game. --benson On Tue, Jun 9, 2009 at 6:02 PM, Grant Ingersoll

Re: Labels for Vectors and Matrices (MAHOUT-65)

2009-06-09 Thread Grant Ingersoll
On Jun 9, 2009, at 5:44 PM, Benson Margulies wrote: I've basically bailed on doing much with Mahout until something like this commits. The only way it ever gets better is by people pitching in, but, heh, you know that already! ;-) On Tue, Jun 9, 2009 at 2:57 PM, Grant Ingersoll wrot

A Bunch of Vector questions

2009-06-09 Thread Grant Ingersoll
I'm looking into the whole labels thing as well as Vector stuff and I'm confused by a couple of things. 1. DirchletMapper assumes DenseVector implementation, no? Line 45? 2. Shouldn't DenseVector implement equals like SparseVector does? 3. VectorView doesn't appear to implement asFormatStrin

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-09 Thread Benson Margulies (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717832#action_12717832 ] Benson Margulies commented on MAHOUT-121: - I could make you a fast sparse vector, b

Re: Labels for Vectors and Matrices (MAHOUT-65)

2009-06-09 Thread Benson Margulies
I've basically bailed on doing much with Mahout until something like this commits. On Tue, Jun 9, 2009 at 2:57 PM, Grant Ingersoll wrote: > What are people doing for keeping track of their vectors and matrices? >  MAHOUT-65 attempted to address that, but it seems to have gotten stuck. > > For MAHO

Re: Mahout's Dirichlet Process Mixture Model implementation

2009-06-09 Thread Ted Dunning
Fabulous! Some details in-line. On Mon, Jun 8, 2009 at 7:06 AM, Sebastien Bratieres wrote: > - most importantly, the parameter re-estimation step currently is a maximum > likelihood re-estimation. So the algorithm is not guaranteed to do actual > training/converge/work at all. In order to do Gi

[jira] Commented: (MAHOUT-121) Speed up distance calculations for sparse vectors

2009-06-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717812#action_12717812 ] Grant Ingersoll commented on MAHOUT-121: I've been doing some profiling and we do i

Mahout's Dirichlet Process Mixture Model implementation

2009-06-09 Thread Sebastien Bratieres
Dear Mahout developers, I am planning to contribute to the Dirichlet Process Clustering algorithm implemented by Jeff (Eastman). I have read through the code in some detail, and discussed a couple of points with Jeff already in order not to create a mess. That way I could understand how the code o

Labels for Vectors and Matrices (MAHOUT-65)

2009-06-09 Thread Grant Ingersoll
What are people doing for keeping track of their vectors and matrices? MAHOUT-65 attempted to address that, but it seems to have gotten stuck. For MAHOUT-126 (document clustering prep work), I ended up outputting Vector cell information to a separate file, which works but is cumbersome.

[jira] Commented: (MAHOUT-126) Prepare document vectors from the text

2009-06-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717742#action_12717742 ] Grant Ingersoll commented on MAHOUT-126: Note, I haven't actually tried clustering

[jira] Updated: (MAHOUT-126) Prepare document vectors from the text

2009-06-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-126: --- Attachment: MAHOUT-126.patch Here's a first attempt at my thoughts based on the two previous

[jira] Resolved: (MAHOUT-130) Vector should allow for other normalize powers than the L-2 norm

2009-06-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-130. Resolution: Fixed Committed Ted's patch > Vector should allow for other normalize powers t

[jira] Commented: (MAHOUT-130) Vector should allow for other normalize powers than the L-2 norm

2009-06-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717594#action_12717594 ] Sean Owen commented on MAHOUT-130: -- Regarding IntelliJ warnings -- double-check that your