K-means is attempting to load your initial clusters and is not finding any. Have you checked your -c path? You can also add -xm sequential so you can run the sequential algorithm. This allows you to use a debugger to verify your paths.
-----Original Message----- From: Ahmad Ammari [mailto:ammari...@gmail.com] Sent: Wednesday, November 16, 2011 7:19 AM To: user@mahout.apache.org Subject: NewsKMeansClustering does not find any clusters! Hello, I am practicing the mahout examples in the clustering part of the book "Mahout in action", particularly chapter 9. In Section 9.1.4, I am trying to run the class NewsKMeansClustering, which I got its source code from the companion source code files. What I understood is that the input directory "inputDir" should contain the input documents in SequenceFile format. Therefore, I tried to make the "reuters-seqfiles" directory that we generated using the seqdirectory program that runs in the mahout launcher in chapter 8 (page 139). I then ran the NewsKMeansClustering, which started to run fine, until I get a java.lang.IllegalStateException exception, saying that No clusters found, as follows: java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO: map 0% reduce 0% 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO: Job complete: job_local_0010 16-Nov-2011 00:49:14 org.apache.hadoop.mapred.Counters log INFO: Counters: 0 Exception in thread "main" java.lang.InterruptedException: K-Means Iteration failed processing reutersClusters/canopy-centroids/clusters-0 at org.apache.mahout.clustering.kmeans.KMeansDriver.runIteration(KMeansDriver.java:363) at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClustersMR(KMeansDriver.java:310) at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:237) at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:152) at clusterer.NewsKMeansClustering.main(NewsKMeansClustering.java:81) ------------------------------------------------------------------------ BUILD FAILURE ------------------------------------------------------------------------ Total time: 15.391s Finished at: Wed Nov 16 00:49:14 GMT 2011 Final Memory: 10M/150M ------------------------------------------------------------------------ Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (default-cli) on project mahout-examples: Command execution failed. Process exited with an error: 1(Exit value: 1) -> [Help 1] To see the full stack trace of the errors, re-run Maven with the -e switch. Re-run Maven using the -X switch to enable full debug logging. For more information about the errors and possible solutions, please read the following articles: [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException What does it mean that no cluster found?! Is the input directory wrong? If so, what input should I give the class? I tried to change the canopy thresholds (250, 120) to some other numbers, tried also changing the EuclideanDistanceMeasure for the canopy clustering to CosineDistanceMeasure, with no use. Many thanks in advance, Ahmad