I have 3 node cluster of CDH4.6, however I have build Mahout 0.9 with Hadoop 2.x profile.
I have also created a mount point for these node and the path uri is same as HDFS. I have manually configured filesystem parameter conf.set("fs.hdfs.impl",org. apache.hadoop.hdfs.DistributedFileSystem.class.getName()); conf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName()); Input data(sequence file) and Cluster center(output of Canopy) are present in HDFS. After this I am executing KmeansDriver using ToolRunner but got the error as shown above. After debugging I have found that cluster-0 is getting created in Mount Point and cluster-1 in HDFS if I dont provide file system scheme. Once i provide the file system scheme i.e. "hdfs://<<>>/", everything works like charm. On Mon, Mar 17, 2014 at 4:24 PM, Suneel Marthi <suneel_mar...@yahoo.com>wrote: > Have not seen that behavior with KMeans, what were ur settings again? > Sorry joining late onto this thread, hence have not looked at the entire > history. > > > > > On Monday, March 17, 2014 6:52 AM, Bikash Gupta < > bikash.gupt...@gmail.com> wrote: > Suneel, > > Just for information, I havent found this issue in Canopy. Canopy > cluster-0 was created in HDFS only. > > However Kmeans cluster-0 was created in local file system and cluster-1 in > HDFS and after that it spit an error as it was unable to locate cluster-0 > > > On Mon, Mar 17, 2014 at 3:10 PM, Suneel Marthi <suneel_mar...@yahoo.com>wrote: > > This problem's specifically to do with Canopy clustering and is not an > issue with KMeans. I had seen this behavior with Canopy and looking at the > code its indeed an issue wherein cluster-0 is created on the local file > system and the remaining clusters land on HDFS. > > Please file a JIRA for this if not already done so. > > > > > > On Wednesday, March 12, 2014 3:02 AM, Bikash Gupta < > bikash.gupt...@gmail.com> wrote: > > Hi, > > Problem is not with input path, its the way Kmeans is getting executed. Let > me explain. > > I have created CSV->Sequence using map-reduce hence my data is in HDFS > After this I have run Canopy MR hence data is also in HDFS > > Now these two things are getting pushed in Kmeans MR. > > If you check KmeansDriver class, at first it tries to create cluster-0 > folder with data, here if you dont specify the scheme then it will write in > local file system. After that MR job is getting started which is expecting > cluster-0 in HDFS. > > Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR); > ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta); > ClusterClassifier prior = new ClusterClassifier(clusters, policy); > prior.writeToSeqFiles(priorClustersPath); > > if (runSequential) { > ClusterIterator.iterateSeq(conf, input, priorClustersPath, output, > maxIterations); > } else { > ClusterIterator.iterateMR(conf, input, priorClustersPath, output, > maxIterations); > } > > Let me know if I am not able to explain clearly. > > > > On Wed, Mar 12, 2014 at 11:53 AM, Sebastian Schelter <s...@apache.org> > wrote: > > > Hi Bikash, > > > > Have you tried adding hdfs:// to your input path? Maybe that helps. > > > > --sebastian > > > > > > On 03/11/2014 11:22 AM, Bikash Gupta wrote: > > > >> Hi, > >> > >> I am running Kmeans in cluster where I am setting the configuration of > >> fs.hdfs.impl and fs.file.impl before hand as mentioned below > >> > >> conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs. > >> DistributedFileSystem.class.getName()); > >> conf.set("fs.file.impl",org.apache.hadoop.fs. > >> LocalFileSystem.class.getName()); > >> > >> Problem is that cluster-0 directory is getting created in local file > >> system > >> and cluster-1 is getting created in HDFS, and Kmeans map reduce job is > >> unable to find cluster-0 . Please see below the stacktrace > >> > >> 2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments: > >> {--clustering=null, --clusters=[/3/clusters-0-final], > >> --convergenceDelta=[0.1], > >> --distanceMeasure=[org.apache.mahout.common.distance. > >> EuclideanDistanceMeasure], > >> --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100], > >> --method=[mapreduce], --output=[/5], --overwrite=null, --startPhase=[0], > >> --tempDir=[temp]} > >> 2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load > >> native-hadoop library for your platform... using builtin-java classes > >> where > >> applicable > >> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence > >> Clusters In: /3/clusters-0-final Out: /5 > >> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max > >> Iterations: 100 > >> 2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser > for > >> parsing the arguments. Applications should implement Tool for the same. > >> 2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input paths > >> to > >> process : 3 > >> 2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job: > >> job_201403111332_0011 > >> 2014-03-11 14:52:20 o.a.h.m.JobClient [INFO] map 0% reduce 0% > >> 2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id : > >> attempt_201403111332_0011_m_000000_0, Status : FAILED > >> 2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException: > >> /5/clusters-0 > >> at > >> org.apache.mahout.common.iterator.sequencefile. > >> SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable. > >> java:78) > >> at > >> > org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles( > >> ClusterClassifier.java:208) > >> at > >> org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44) > >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138) > >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask. > >> java:672) > >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) > >> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at javax.security.auth.Subject.doAs(Subject.java:415) > >> at > >> org.apache.hadoop.security.UserGroupInformation.doAs( > >> UserGroupInformation.java:1438) > >> at org.apache.hadoop.mapred.Child.main(Child.java:262) > >> Caused by: java.io.FileNotFoundException: File /5/clusters-0 > >> > >> Please suggest!!! > >> > >> > >> > > > > > -- > Thanks & Regards > Bikash Kumar Gupta > > > > > -- > Thanks & Regards > Bikash Kumar Gupta > > > -- Thanks & Regards Bikash Kumar Gupta