Clustering with id

2011-07-10 Thread Gabor Makrai
Hi, I'm a little bit confused about Mahout's clustering algorithms. I like to clustering data with id column. How can I do that? For example, I like to run K-Means clustering on the Iris data set ( http://archive.ics.uci.edu/ml/datasets/Iris) where I've got four numerical columns. I generated an i

Re: Clustering with id

2011-07-10 Thread Lance Norskog
The NamedVector class adds a string to any vector, forwarding all methods to the wrapped vector. You can cluster these, and then pull the strings. The clustering algorithm operates on the wrapped vector. Lance On Sun, Jul 10, 2011 at 4:18 PM, Gabor Makrai wrote: > Hi, > > I'm a little bit confus

Re: Clustering with id

2011-07-11 Thread Gabor Makrai
Thank you very much! NamedVector has to solve my problem! Anyway, I'm always wondering the answer speed in the Hadoop lists! Thank you, Gabor On Mon, Jul 11, 2011 at 3:51 AM, Lance Norskog wrote: > The NamedVector class adds a string to any vector, forwarding all > methods to the wrapped vector

Re: Clustering with id

2011-07-11 Thread Lance Norskog
I'm finding it hard to maintain these labels across vector and matrix factorizations & direct operations. On Mon, Jul 11, 2011 at 1:10 AM, Gabor Makrai wrote: > Thank you very much! NamedVector has to solve my problem! > Anyway, I'm always wondering the answer speed in the Hadoop lists! > > Thank

Re: Clustering with id

2011-07-11 Thread Lance Norskog
I mean, walking through the algorithms and tracking what vector name becomes what matrix row/column label. On Mon, Jul 11, 2011 at 8:58 PM, Lance Norskog wrote: > I'm finding it hard to maintain these labels across vector and matrix > factorizations & direct operations. > > On Mon, Jul 11, 2011 a

Re: Clustering with id

2011-07-11 Thread Ted Dunning
Can you give specific examples? The process should be relatively straightforward and the implication that rows have row labels that are defined by the left operand of a product and columns have column labels that are defined by the right operand should be sufficient. Sums should have the same row

Re: Clustering with id

2011-07-11 Thread Lance Norskog
My algorithm was wrong anyway, and was making things harder for myself than I needed. On Mon, Jul 11, 2011 at 9:36 PM, Ted Dunning wrote: > Can you give specific examples?  The process should be relatively > straightforward and the implication that rows have row labels that are > defined by the l