I am trying to run Kmeans clustering on below set of data,

Name,Gender,Age,Drinks,Country
John,M,30,Pepsi,US
Jack,M,25,Coke,US
David,M,34,Pepsi,UK
Ted,M,37,Limca,CAN
Robert,M,23,Limca,US
Adrian,M,31,Pepsi,US
Craig,M,37,Coke,UK
Katie,F,23,Limca,UK
Nancy,F,32,Pepsi,UK

I want to cluster the data based on Drinks(pepsi,coke,Limca)and i am able
to do it.But i want to retrive name also alongside clustered data.

the output i am getting is

0
1
2
Limca belongs to cluster:0
Cokde belongs to cluster:0
etc.

here i want to get the names also.

while converting to sequence file i am taking key as drinks and value as
the rest of text and converting to sparsevector and then running Kmeans
clustering,the names are not printed. can anybody point how i extract name
from the clusters which are there in values.

Reply via email to