Hi

I am trying out the canopy-clustering driver from Java using Mahout-0.8 and am 
getting a very odd error.

java.io.IOException: Mkdirs failed to create 
/test_clustering_output/clusters-0-final
        at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:364)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:564)
        at 
org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:896)
        at 
org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:884)
        at 
org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:876)
        at 
org.apache.mahout.clustering.classify.ClusterClassifier.writePolicy(ClusterClassifier.java:234)
        at 
org.apache.mahout.clustering.canopy.CanopyDriver.clusterData(CanopyDriver.java:373)
        at 
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:157)
        at 
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:168)
        at service.clustering.algorithms.CanopyClusterer.cluster(Unknown Source)
        at service.clustering.ClusterRunner.doClustering(Unknown Source)
        at test.service.NonJunitClustererTest.testClustering(Unknown Source)
        at test.service.NonJunitClustererTest.main(Unknown Source)
Clustering failed: Mkdirs failed to create 
/test_clustering_output/clusters-0-final

Contrary to the message, the output folder 
/test_clustering_output/clusters-0-final HAS BEEN CREATED. If I do...

"hadoop fs -ls /test_clustering_output/clusters-0-final" I get...

Warning: $HADOOP_HOME is deprecated.
Found 3 items
-rw-r--r--   1 rob supergroup          0 2014-01-29 21:33 
/test_clustering_output/clusters-0-final/_SUCCESS
drwxr-xr-x   - rob supergroup          0 2014-01-29 21:32 
/test_clustering_output/clusters-0-final/_logs
-rw-r--r--   1 rob supergroup        106 2014-01-29 21:33 
/test_clustering_output/clusters-0-final/part-r-00000

---
I am running on a single node hadoop cluster on AWS/Ubuntu and I'm trying to 
run the driver from Java...

Configuration hfsConf = new Configuration();
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/core-site.xml"));
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/hdfs-site.xml"));
hfsConf.addResource(new Path(PROPS.getHadoopHome() + "/conf/mapred-site.xml"));
try {
    CanopyDriver.run(
        hfsConf,     // HS file system configuration 
        new Path(hadoopInputSequenceFile),   // Input sequence file of 
geovectors
        new Path(hadoopOutputFile),  // Output file
        dm,                          // Distance measure
        t1,                         // Canopy T1 radius
        t2,                            // Canopy T2 radius 
        true,                         // true to cluster the input vectors
        0.0,                         // vectors having pdf below this value 
will not be clustered. Its value should be between 0 and 1
        false);                        //  execute sequentially if true
    return true;
} catch (Exception e) {
    e.printStackTrace();
} 

Any help would be most appreciated.  I have tried almost everything I can think 
of including switching off permissions in the hadoop config,  ensuring that my 
hadoop.tmp folder has open permissions. Only remaining hunches are (a) perhaps 
the Configuration object does not have enough information (b) I am adding one 
or two separate jars to the HADOOP_CLASSPATH instead of trying to add all to 
the mahout-job jar. 

Rob

Reply via email to