Hi all, I'm using mahout 0.7 I'm trying to use KmeansDriver (org.apache.mahout.clustering.kmeans.KMeansDriver) with HDFS and I'm having some issues. When I use it with my local file system everything seems to be working fine. However, as soon as I change the Configuration object to use HDFS: Configuration conf = new Configuration(); conf.addResource(new Path("C:\\hdp-win\\hadoop\\hadoop-1.1.0-SNAPSHOT\\conf\\core-site.xml")); conf.addResource(new Path("C:\\hdp-win\\hadoop\\hadoop-1.1.0-SNAPSHOT\\conf\\hdfs-site.xml"))
I run into problems I was looking at the exception I get: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:215) I pulled that code (org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:215)and I think is trying to read a file from one of the paths I passed to the method but with a new instance of the configuration object (not the configuration object I passed to the method but one that doesn't have my HDFS configured) 205 public void [More ...] readFromSeqFiles(Configuration conf, Path path) throws IOException { 206 Configuration config = new Configuration(); 207 List<Cluster> clusters = Lists.newArrayList(); 208 for (ClusterWritable cw : new SequenceFileDirValueIterable<ClusterWritable>(path, PathType.LIST, 209 PathFilters.logsCRCFilter(), config)) { 210 Cluster cluster = cw.getValue(); 211 cluster.configure(conf); 212 clusters.add(cluster); 213 } 214 this.models = clusters; 215 modelClass = models.get(0).getClass().getName(); 216 this.policy = readPolicy(path); 217 } any help would be really appreciated :) Thanks! Alan