Hello, I have a set of 10 dimensional vectors, which I wanted to group into clusters. I ran mahout kmeans clustering program as follows :
$ mahout kmeans --input input/ --output output/ --clusters clusters/ -k 20 -xm sequential --maxIter 10000 -ow -cd 0.0000000000005 It produces clusters as follows: gourav@mustang2:~$ mahout clusterdump -i output/clusters-*-final/ -o dump; cat dump VL-422383{n=29 c=[93.241, 0.241, 187383906066.860, 0.070, 0.057, 0.042, 0.000] r=[237.392, 0.625, 29412153437.220, 0.236, 0.036, 0.049, 0.001]} VL-344819{n=133921 c=[50.032, 775.298, -0.000, 300288032.310, -0.043, 0.031, 0.016, 0.000] r=[233.523, 142338.059, 0.007, 92781073.166, 0.267, 0.026, 0.018, 0.000]} VL-344939{n=3 c=[2.667, 520677772968.333, 0.017, 0.007, 0.000] r=[0.471, 184177690037.170, 0.008, 0.002, 0.000]} VL-68598{n=21089 c=[91.973, 1.022, 1489688386.753, -0.045, 0.032, 0.024, 0.000] r=[546.717, 62.027, 246594193.663, 0.278, 0.029, 0.026, 0.000]} As you can see, centroid and radius dimension differs between clusters. I think all dimensions which were zero (0) are ignored. How can I have an output with original number dimensions ? Thank you, Gourav