Hi, I tried to run Spectral clustering example from mahout website on EMR.
I uploaded to the bucket the following files: affinity.txt (affinity matrix) mahout-core-0.9-job.jar mahout-core-0.9.jar update-lucene.sh lucene-4.3.0.tgz The update-lucene.sh contains the following: #!/bin/bash cd /home/hadoop wget https://s3.amazonaws.com/hellomahout/lucene-4.3.0.tgz tar -xzf lucene-4.3.0.tgz cd lib rm lucene-*.jar cd .. cd lucene-4.3.0 find . | grep lucene- | grep jar$ | xargs -I {} cp {} ../lib The Cluster configuration is the following: Hadoop Distribution: Amazon, AMI version: 3.2.1 EC" instance types: Master: m1.large, 1 Core: m1.large, 1 Task: None (m1.medium,1) Bootstrap Actions: Custom action, S3 location: s3://hellomahout/update-lucene.sh Steps: Custom JAR, JAR location: s3://hellomahout/mahout-core-0.9-job.jar, Arguments: org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver --input s3://hellomahout/testdata/affinity.txt --output s3://hellomahout/testdata/results -d 3 -k 2 -x 10 When I try to run it, I get the following exception: Exception in thread "main" java.io.FileNotFoundException: No such file or directory 'hdfs://172.31.1.27:9000/user/hadoop/temp/calculations/unitvectors' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:759) at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:507) at org.apache.mahout.clustering.kmeans.EigenSeedGenerator.buildFromEigens(EigenSeedGenerator.java:67) at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:243) at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:127) at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:118) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.main(SpectralKMeansDriver.java:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Does anyone know what causes the exception? Could anyone provide any suggestions about how to run spectral clustering on EMR? Thank you. Niko