kmeans-init-clusters should be in a file with a name like 'part-xxxx' and not 
the way you have it (kmeans-init-clusters).





On Tuesday, December 24, 2013 2:15 PM, Sameer Tilak <ssti...@live.com> wrote:
 
Hi all,

I get the following problem whehn I run k-mens clustering on my real data. Any 
ehlp with this would be great!


Here is data that I read out of the  Sequencefile:


022960 value: 
022960:{269830:1.0,2042:1.0,145659:1.0,143547:1.0,219265:1.0,321251:1.0,202350:1.0,258610:1.0,239068:1.0,259181:1.0,259177:1.0,33391:1.0,414092:1.0,139519:1.0,428431:1.0,277140:1.0,279116:1.0,426540:1.0,225715:1.0,331909:1.0,347374:1.0,257840:1.0}
022963 value: 
022963:{256857:1.0,269830:1.0,2042:1.0,145659:1.0,143547:1.0,219265:1.0,321251:1.0,202350:1.0,258610:1.0,239068:1.0,259181:1.0,259177:1.0,33391:1.0,414092:1.0,139519:1.0,428431:1.0,277140:1.0,279116:1.0,426540:1.0,225715:1.0,438788:1.0,347374:1.0,257840:1.0}
022966 value: 
022966:{122295:1.0,143547:1.0,359770:1.0,349739:1.0,279116:1.0,347374:1.0,225715:1.0,295315:1.0,239068:1.0,426540:1.0,25381:1.0,258670:1.0,139519:1.0,140726:1.0,202350:1.0,33391:1.0,80747:1.0,317618:1.0,315249:1.0,219265:1.0,258610:1.0,269830:1.0,446719:1.0,414092:1.0,259177:1.0,15069:1.0,259181:1.0,145659:1.0,257840:1.0,2042:1.0,8916:1.0,349953:1.0}
022968 value: 
022968:{382600:1.0,204616:1.0,120442:1.0,213430:1.0,274369:1.0,267345:1.0,350041:1.0,259356:1.0,83126:1.0,270754:1.0,139519:1.0,362853:1.0,279116:1.0}
022969 value: 
022969:{270754:1.0,120442:1.0,259356:1.0,139519:1.0,274369:1.0,279116:1.0,236587:1.0,287087:1.0,445965:1.0}
022972 value: 
022972:{270695:1.0,382600:1.0,426510:1.0,213430:1.0,274369:1.0,267345:1.0,350041:1.0,259356:1.0,83126:1.0,270754:1.0,63705:1.0,139519:1.0,279116:1.0}

Here is where I write seed clusters to the file. It shows that it wrote 10 
clusters.

String KmeansInitClusterFile = "/scratch/kmeans-init-clusters";

SimpleKMeansClustering::generateClusters wrote the following cluster to the 
file (/scratch/kmeans-init-clusters) :
CL-0{n=0 c=021105 = [25381:1.000, 139519:1.000, 140726:1.000, 145659:1.000, 
239068:1.000, 279116:1.000, 349739:1.000] r=021105 =]}
SimpleKMeansClustering::generateClusters wrote the following cluster to the 
file:
CL-1{n=0 c=021111 = [25381:1.000, 139519:1.000, 140726:1.000, 145659:1.000, 
239068:1.000, 279116:1.000, 349739:1.000] r=021111 =]}
SimpleKMeansClustering::generateClusters wrote the following cluster to the 
file:
CL-2{n=0 c=021117 = [49100:1.000, 120442:1.000, 258280:1.000, 259339:1.000, 
259356:1.000, 268294:1.000, 269084:1.000, 270702:1.000, 270754:1.000, 
274369:1.000, 274626:1.000] r=021117 =]}
SimpleKMeansClustering::generateClusters wrote the following cluster to the 
file:
CL-3{n=0 c=021118 = [120442:1.000, 258280:1.000, 259339:1.000, 259356:1.000, 
269084:1.000, 270702:1.000, 270754:1.000, 274369:1.000, 274626:1.000] r=021118 
=]}
SimpleKMeansClustering::generateClusters wrote the following cluster to the 
file:
CL-4{n=0 c=021119 = [426510:1.000] r=021119 =]}
SimpleKMeansClustering::generateClusters wrote the following cluster to the 
file:
CL-5{n=0 c=021120 = [9071:1.000, 49100:1.000, 63705:1.000, 120442:1.000, 
139519:1.000, 140663:1.000, 145659:1.000, 213430:1.000, 239068:1.000, 
251173:1.000, 258280:1.000, 259356:1.000, 267345:1.000, 268294:1.000, 
270695:1.000, 276249:1.000, 279116:1.000, 309165:1.000, 350040:1.000, 
445676:1.000] r=021120 =]}
SimpleKMeansClustering::generateClusters wrote the following cluster to the 
file:
CL-6{n=0 c=021122 = [6240:1.000, 259356:1.000, 259830:1.000, 270754:1.000, 
274369:1.000, 388477:1.000, 426510:1.000] r=021122 =]}
SimpleKMeansClustering::generateClusters wrote the following cluster to the 
file:
CL-7{n=0 c=021123 = [49100:1.000, 138703:1.000, 139070:1.000, 139519:1.000, 
259356:1.000, 268294:1.000, 270695:1.000, 277065:1.000, 279116:1.000, 
309165:1.000, 445834:1.000] r=021123 =]}
SimpleKMeansClustering::generateClusters wrote the following cluster to the 
file:
CL-8{n=0 c=021124 = [1667:1.000, 9071:1.000, 15397:1.000, 29237:1.000, 
49100:1.000, 63705:1.000, 138703:1.000, 139070:1.000, 139519:1.000, 
140663:1.000, 213430:1.000, 238903:1.000, 259356:1.000, 260088:1.000, 
267345:1.000, 268294:1.000, 270695:1.000, 270754:1.000, 274347:1.000, 
276249:1.000, 279116:1.000, 291707:1.000, 295315:1.000, 309165:1.000, 
313307:1.000, 317618:1.000, 320741:1.000, 349953:1.000, 350040:1.000, 
387714:1.000, 445676:1.000] r=021124 =]}
SimpleKMeansClustering::generateClusters wrote the following cluster to the 
file:
CL-9{n=0 c=021125 = [49100:1.000, 139519:1.000, 268294:1.000, 279116:1.000, 
384009:1.000] r=021125 =]}

I use the following method in my class to perform k-means:

KMeansDriver.run(this.conf, new Path(SparceVectorizedCidFile), new 
Path(KmeansInitClusterFile),
                             new Path(KmeansClustersResultsFile), new 
EuclideanDistanceMeasure(), 0.001, 5,
                             true, 1.0, false);

13/12/24 11:09:27 INFO kmeans.KMeansDriver: Input: 
/scratch/SparceVectorizedConceptIds Clusters In: /scratch/kmeans-init-clusters 
Out: /scratch/KmeansClustersResultsFile Distance: 
org.apache.mahout.common.distance.EuclideanDistanceMeasure
13/12/24 11:09:27 INFO kmeans.KMeansDriver: convergence: 0.001 max Iterations: 5
java.lang.IllegalStateException: No input clusters found in 
/scratch/kmeans-init-clusters. Check your -c argument.
    at 
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:212)
    at 
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
    at 
myanalytics.SimpleKMeansClustering.runKmeansDriver(SimpleKMeansClustering.java:209)
    at myanalytics.SimpleKMeansClustering.main(SimpleKMeansClustering.java:269)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
-bash-4.1$ hadoop dfs -ls /scratch/kmeans-init-clusters
Warning: $HADOOP_HOME is deprecated.

Found 1 items
-rw-r--r--   1 userid supergroup       2850 2013-12-24 11:09 
/scratch/kmeans-init-clusters
-bash-4.1$

Reply via email to