Re: Clustering in Mahout 0.9 candidate

2014-01-24 Thread Ted Dunning
Dang. This community stuff is awesome. Kudos to all you guys for jumping on this. My only nit is whether this should move to the dev list. On Fri, Jan 24, 2014 at 2:30 PM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Thanks guys, I will look at it this weekend too. > > > On Fri,

Re: Clustering in Mahout 0.9 candidate

2014-01-24 Thread Andrew Musselman
Thanks guys, I will look at it this weekend too. On Fri, Jan 24, 2014 at 2:24 PM, Pat Ferrel wrote: > I have a setup using hadoop M/R kmeans for testing. If I can help in any > way let me know and if you don’t get to it I’ll have a look this weekend. > > Thanks > > On Jan 24, 2014, at 1:56 PM,

Re: Clustering in Mahout 0.9 candidate

2014-01-24 Thread Pat Ferrel
I have a setup using hadoop M/R kmeans for testing. If I can help in any way let me know and if you don’t get to it I’ll have a look this weekend. Thanks On Jan 24, 2014, at 1:56 PM, Suneel Marthi wrote: Pat, Andrew's not filed a JIRA for this, so thanks for filing M-1410 to track this. The

Re: Clustering in Mahout 0.9 candidate

2014-01-24 Thread Suneel Marthi
Pat, Andrew's not filed a JIRA for this, so thanks for filing M-1410 to track this. The fix would be to modify ClusterIterator.iterateSeq() - (for the Sequential mode) to read the vector key along with the vector. For the MR mode, CIMapper.java needs to be modified to read the vector key along

Re: Clustering in Mahout 0.9 candidate

2014-01-24 Thread Pat Ferrel
Yeah, it’s not really the issue with M-1030 but makes the fix unusable. I apologize for not noticing this sooner, my own fault I guess. Did you file a JIRA against the larger issue? Any ETA on a fix (0.9?). Should I go ahead and write my own cluster categorizer? You and Suneel pointed to the pr

Re: Clustering in Mahout 0.9 candidate

2014-01-24 Thread Andrew Musselman
That's correct; I reported that last summer and didn't fix it in M-1030 since it didn't seem like that's what the group wanted in that bug. I see you're filing another bug, thanks. On Fri, Jan 24, 2014 at 10:29 AM, Pat Ferrel wrote: > I can’t believe I haven’t noticed this before and so am hop

Clustering in Mahout 0.9 candidate

2014-01-24 Thread Pat Ferrel
I can’t believe I haven’t noticed this before and so am hoping I’m mistaken… When you are using kmeans to cluster data where there is no “named” vector, clusteredPoints do not contain the vector ids so the cluster id, pdf, “distance-squared”, and vector dimensions are not tied to any known vecto