On the n=0 point and its associated r= problem: Where did you get the clusters file? If it was from a clusters-0 directory then this would be explained by the fact that these clusters are inputs to the algorithm and have not yet observed any of the points. Also, the "CL" encoding indicates the clusters have not converged (would be "VL" otherwise). This also indicates you may be dumping the wrong clusters directory. Its not possible to tell from the command line you have.

On 8/30/10 4:45 AM, Jeff Eastman wrote:
 On 8/29/10 4:12 PM, Valerio Ceraudo wrote:
Hi,

finally i run my first kmeans but now i have got a little issue,how to get the
result!

I ran this command:

bin/mahout clusterdump -s /home/vuvvo/clusters/ -o /home/vuvvo/finalOutput

that created a new file, finalOutput, with inside this:

CL-21551{n=0 c=[575:9.188, 1808:3.181, 2547:4.371, 4163:1.124, 5124:4.596, 8861:4.434, 9283:3.338, 9890:8.271, 10037:8.539, 12965:4.542, 13168:5.738, 13565:4.949, 14237:8.155, 15400:4.632, 16624:7.935, 17506:8.146, 17539:5.145, 17618:9.048, 17911:3.863, 18054:8.035, 19236:6.017, 21095:4.222, 21693:7.932, 21748:3.895, 22512:4.292, 23003:5.238, 24173:7.173, 24613:3.429, 26094:1.133, 26587:1.335, 26989:4.451, 28079:7.765, 28086:5.722, 29221:7.316, 29518:3.197,
30388:2.589, 30554:6.252, 30555:7.173] r=/reut2-021.sgm-75.txt =]}
CL-21560{n=0 c=[1038:2.701, 1808:3.181, 1997:2.870, 2289:4.733, 2547:4.371, 4163:1.124, 4750:4.726, 5951:9.593, 13610:3.942, 13918:5.486, 14641:4.362, 17528:3.302, 18699:2.280, 20900:9.188, 22742:6.573, 23678:3.758, 23692:5.211, 24645:4.078, 25024:4.584, 25451:3.575, 26094:1.133, 27130:4.263, 30790:2.870]
r=/reut2-021.sgm-83.txt =]}
CL-21576{n=0 c=[1736:8.677, 1808:3.181, 2425:3.548, 2547:4.371, 3147:4.257,
4163:1.124, 8873:5.482, 8902:3.481, 9324:6.045, 9702:5.867, 9712:7.424,
10605:2.932, 11857:5.217, 12762:8.359, 12763:5.171, 12880:8.900, 12909:7.088, 13419:4.930, 14143:4.105, 14257:4.622, 16915:4.902, 17376:3.317, 17525:6.689, 17673:4.020, 17911:2.732, 17934:7.048, 18094:4.005, 18822:3.928, 19081:3.448, 19200:4.315, 19319:1.992, 19377:8.567, 20746:3.430, 20881:3.581, 22751:5.024, 23605:6.447, 24326:8.582, 24863:6.209, 24913:4.002, 25701:5.781, 26094:1.133, 26587:1.888, 26787:7.545, 26978:5.011, 26980:5.533, 27631:4.850, 27986:3.793, 29213:16.616, 29649:3.400, 29660:3.857, 30023:6.344, 30322:4.009, 31232:12.271,
31402:3.719] r=/reut2-021.sgm-98.txt =]}

now,how i must read it? the cl are the clusters and the c=[.......] are the data
of the cluster?

Thanks all


Precisely. CL-n is the clusterId and n= the number of points observed (not sure why yours are all 0) during the last iteration and c= the center of the cluster in sparse notation. Within c= each of the [i:d] pairs denote an index and an associated value. The r= values are the radius of the cluster (std of the observed points). The code in AbstractCluster which formats r= could be improved (formatVector and asFormatString) to better handle the empty radius vectors produced when n=0.

And congratulations on your first k-Means!
Jeff

Reply via email to