On the n=0 point and its associated r= problem: Where did you get the
clusters file? If it was from a clusters-0 directory then this would be
explained by the fact that these clusters are inputs to the algorithm
and have not yet observed any of the points. Also, the "CL" encoding
indicates the clusters have not converged (would be "VL" otherwise).
This also indicates you may be dumping the wrong clusters directory. Its
not possible to tell from the command line you have.
On 8/30/10 4:45 AM, Jeff Eastman wrote:
On 8/29/10 4:12 PM, Valerio Ceraudo wrote:
Hi,
finally i run my first kmeans but now i have got a little issue,how
to get the
result!
I ran this command:
bin/mahout clusterdump -s /home/vuvvo/clusters/ -o
/home/vuvvo/finalOutput
that created a new file, finalOutput, with inside this:
CL-21551{n=0 c=[575:9.188, 1808:3.181, 2547:4.371, 4163:1.124,
5124:4.596,
8861:4.434, 9283:3.338, 9890:8.271, 10037:8.539, 12965:4.542,
13168:5.738,
13565:4.949, 14237:8.155, 15400:4.632, 16624:7.935, 17506:8.146,
17539:5.145,
17618:9.048, 17911:3.863, 18054:8.035, 19236:6.017, 21095:4.222,
21693:7.932,
21748:3.895, 22512:4.292, 23003:5.238, 24173:7.173, 24613:3.429,
26094:1.133,
26587:1.335, 26989:4.451, 28079:7.765, 28086:5.722, 29221:7.316,
29518:3.197,
30388:2.589, 30554:6.252, 30555:7.173] r=/reut2-021.sgm-75.txt =]}
CL-21560{n=0 c=[1038:2.701, 1808:3.181, 1997:2.870, 2289:4.733,
2547:4.371,
4163:1.124, 4750:4.726, 5951:9.593, 13610:3.942, 13918:5.486,
14641:4.362,
17528:3.302, 18699:2.280, 20900:9.188, 22742:6.573, 23678:3.758,
23692:5.211,
24645:4.078, 25024:4.584, 25451:3.575, 26094:1.133, 27130:4.263,
30790:2.870]
r=/reut2-021.sgm-83.txt =]}
CL-21576{n=0 c=[1736:8.677, 1808:3.181, 2425:3.548, 2547:4.371,
3147:4.257,
4163:1.124, 8873:5.482, 8902:3.481, 9324:6.045, 9702:5.867, 9712:7.424,
10605:2.932, 11857:5.217, 12762:8.359, 12763:5.171, 12880:8.900,
12909:7.088,
13419:4.930, 14143:4.105, 14257:4.622, 16915:4.902, 17376:3.317,
17525:6.689,
17673:4.020, 17911:2.732, 17934:7.048, 18094:4.005, 18822:3.928,
19081:3.448,
19200:4.315, 19319:1.992, 19377:8.567, 20746:3.430, 20881:3.581,
22751:5.024,
23605:6.447, 24326:8.582, 24863:6.209, 24913:4.002, 25701:5.781,
26094:1.133,
26587:1.888, 26787:7.545, 26978:5.011, 26980:5.533, 27631:4.850,
27986:3.793,
29213:16.616, 29649:3.400, 29660:3.857, 30023:6.344, 30322:4.009,
31232:12.271,
31402:3.719] r=/reut2-021.sgm-98.txt =]}
now,how i must read it? the cl are the clusters and the c=[.......]
are the data
of the cluster?
Thanks all
Precisely. CL-n is the clusterId and n= the number of points observed
(not sure why yours are all 0) during the last iteration and c= the
center of the cluster in sparse notation. Within c= each of the [i:d]
pairs denote an index and an associated value. The r= values are the
radius of the cluster (std of the observed points). The code in
AbstractCluster which formats r= could be improved (formatVector and
asFormatString) to better handle the empty radius vectors produced
when n=0.
And congratulations on your first k-Means!
Jeff