I'm not sure centroid id is even a defined thing, especially since the centroid, in my understanding, is just a point in space, not necessarily a point in your data.
Are you trying to find the most-central point in a given cluster? On Mon, Jul 20, 2015 at 5:18 PM, Ankit Goel <ankitgoel2...@gmail.com> wrote: > Hi, > I've been messing with mahout 0.10 and kmeans clustering with a solr 4.6.1 > index. The data is news articles. The --field option for kmeans is set to > "content". The idField is set to "title" (just so i can analyse it faster). > The clusterdump of the kmeans result gives me a proper output, but I cant > figure out the id of the vector chosen as the center. There are only 14-15 > articles so I am not hung up about the cluster performance at this time. > > I used random seeds for the kmeans commandline. > For reference, this is the commandline cluster dump I am executing > > bin/mahout clusterdump -i $MAHOUT_HOME/testCluster/clusters-3-final > -p $MAHOUT_HOME/testCluster/clusteredPoints -d $MAHOUT_HOME/dict.txt -b 5 > > The output I get is off the form > > :{"r": > > top terms > > xxxxx==>xxxxx > > Weight : [props - optional]: Point: > > 1.0 : [distance=0.0]: [{"account":0.026}.......other features] > > 1.0 : [distance=0.3963903651622338]: [....] > > > So how exactly do I get the centroid id? I have even tried accessing it > with java > > ClusterWritable value.getValue().getCenter() but this just gives me the > features and values of the centroid. > > Also, please do explain the meaning of "account":0.026 (just making sure I > know it right). I used tfidf. > > -- > Regards, > Ankit Goel > http://about.me/ankitgoel >