Hi I am trying out the canopy-clustering driver from Java using Mahout-0.8 and am getting a very odd error.
java.io.IOException: Mkdirs failed to create /test_clustering_output/clusters-0-final at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:364) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:564) at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:896) at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:884) at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:876) at org.apache.mahout.clustering.classify.ClusterClassifier.writePolicy(ClusterClassifier.java:234) at org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:373) at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:157) at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:168) at service.clustering.algorithms.CanopyClusterer.cluster(Unknown Source) at service.clustering.ClusterRunner.doClustering(Unknown Source) at test.service.NonJunitClustererTest.testClustering(Unknown Source) at test.service.NonJunitClustererTest.main(Unknown Source) Clustering failed: Mkdirs failed to create /test_clustering_output/clusters-0-final Contrary to the message, the output folder /test_clustering_output/clusters-0-final HAS BEEN CREATED. If I do... "hadoop fs -ls /test_clustering_output/clusters-0-final" I get... Warning: $HADOOP_HOME is deprecated. Found 3 items -rw-r--r-- 1 rob supergroup 0 2014-01-29 21:33 /test_clustering_output/clusters-0-final/_SUCCESS drwxr-xr-x - rob supergroup 0 2014-01-29 21:32 /test_clustering_output/clusters-0-final/_logs -rw-r--r-- 1 rob supergroup 106 2014-01-29 21:33 /test_clustering_output/clusters-0-final/part-r-00000 --- I am running on a single node hadoop cluster on AWS/Ubuntu and I'm trying to run the driver from Java... Configuration hfsConf = new Configuration(); hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/core-site.xml")); hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/hdfs-site.xml")); hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/mapred-site.xml")); try { CanopyDriver.run( hfsConf, // HS file system configuration new Path(hadoopInputSequenceFile), // Input sequence file of geovectors new Path(hadoopOutputFile), // Output file dm, // Distance measure t1, // Canopy T1 radius t2, // Canopy T2 radius true, // true to cluster the input vectors 0.0, // vectors having pdf below this value will not be clustered. Its value should be between 0 and 1 false); // execute sequentially if true return true; } catch (Exception e) { e.printStackTrace(); } Any help would be most appreciated. I have tried almost everything I can think of including switching off permissions in the hadoop config, ensuring that my hadoop.tmp folder has open permissions. Only remaining hunches are (a) perhaps the Configuration object does not have enough information (b) I am adding one or two separate jars to the HADOOP_CLASSPATH instead of trying to add all to the mahout-job jar. Rob