Re: Problem with FileSystem in Kmeans

Bikash Gupta Mon, 17 Mar 2014 04:10:21 -0700

I have 3 node cluster of CDH4.6, however I have build Mahout 0.9 with
Hadoop 2.x profile.


I have also created a mount point for these node and the path uri is same
as HDFS.

I have manually configured filesystem parameter

conf.set("fs.hdfs.impl",org.
apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());

Input data(sequence file) and Cluster center(output of Canopy) are present
in HDFS. After this I am executing KmeansDriver using ToolRunner but got
the error as shown above.

After debugging I have found that cluster-0 is getting created in Mount
Point and cluster-1 in HDFS if I dont provide file system scheme. Once i
provide the file system scheme i.e. "hdfs://<<>>/", everything works like
charm.



On Mon, Mar 17, 2014 at 4:24 PM, Suneel Marthi <suneel_mar...@yahoo.com>wrote:

> Have not seen that behavior with KMeans, what were ur settings again?
> Sorry joining late onto this thread, hence have not looked at the entire
> history.
>
>
>
>
>   On Monday, March 17, 2014 6:52 AM, Bikash Gupta <
> bikash.gupt...@gmail.com> wrote:
>  Suneel,
>
> Just for information, I havent found this issue in Canopy. Canopy
> cluster-0 was created in HDFS only.
>
> However Kmeans cluster-0 was created in local file system and cluster-1 in
> HDFS and after that it spit an error as it was unable to locate cluster-0
>
>
> On Mon, Mar 17, 2014 at 3:10 PM, Suneel Marthi <suneel_mar...@yahoo.com>wrote:
>
> This problem's specifically to do with Canopy clustering and is not an
> issue with KMeans. I had seen this behavior with Canopy and looking at the
> code its indeed an issue wherein cluster-0 is created on the local file
> system and the remaining clusters land on HDFS.
>
> Please file a JIRA for this if not already done so.
>
>
>
>
>
> On Wednesday, March 12, 2014 3:02 AM, Bikash Gupta <
> bikash.gupt...@gmail.com> wrote:
>
> Hi,
>
> Problem is not with input path, its the way Kmeans is getting executed. Let
> me explain.
>
> I have created CSV->Sequence using map-reduce hence my data is in HDFS
> After this I have run Canopy MR hence data is also in HDFS
>
> Now these two things are getting pushed in Kmeans MR.
>
> If you check KmeansDriver class, at first it tries to create cluster-0
> folder with data, here if you dont specify the scheme then it will write in
> local file system. After that MR job is getting started which is expecting
> cluster-0 in HDFS.
>
> Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);
>     ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta);
>     ClusterClassifier prior = new ClusterClassifier(clusters, policy);
>     prior.writeToSeqFiles(priorClustersPath);
>
>     if (runSequential) {
>       ClusterIterator.iterateSeq(conf, input, priorClustersPath, output,
> maxIterations);
>     } else {
>       ClusterIterator.iterateMR(conf, input, priorClustersPath, output,
> maxIterations);
>     }
>
> Let me know if I am not able to explain clearly.
>
>
>
> On Wed, Mar 12, 2014 at 11:53 AM, Sebastian Schelter <s...@apache.org>
> wrote:
>
> > Hi Bikash,
> >
> > Have you tried adding hdfs:// to your input path? Maybe that helps.
> >
> > --sebastian
> >
> >
> > On 03/11/2014 11:22 AM, Bikash Gupta wrote:
> >
> >> Hi,
> >>
> >> I am running Kmeans in cluster where I am setting the configuration of
> >> fs.hdfs.impl and fs.file.impl before hand as mentioned below
> >>
> >> conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.
> >> DistributedFileSystem.class.getName());
> >> conf.set("fs.file.impl",org.apache.hadoop.fs.
> >> LocalFileSystem.class.getName());
> >>
> >> Problem is that cluster-0 directory is getting created in local file
> >> system
> >> and cluster-1 is getting created in HDFS, and Kmeans map reduce job is
> >> unable to find cluster-0 . Please see below the stacktrace
> >>
> >> 2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments:
> >> {--clustering=null, --clusters=[/3/clusters-0-final],
> >> --convergenceDelta=[0.1],
> >> --distanceMeasure=[org.apache.mahout.common.distance.
> >> EuclideanDistanceMeasure],
> >> --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100],
> >> --method=[mapreduce], --output=[/5], --overwrite=null, --startPhase=[0],
> >> --tempDir=[temp]}
> >> 2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load
> >> native-hadoop library for your platform... using builtin-java classes
> >> where
> >> applicable
> >> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence
> >> Clusters In: /3/clusters-0-final Out: /5
> >> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max
> >> Iterations: 100
> >> 2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser
> for
> >> parsing the arguments. Applications should implement Tool for the same.
> >> 2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input paths
> >> to
> >> process : 3
> >> 2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job:
> >> job_201403111332_0011
> >> 2014-03-11 14:52:20 o.a.h.m.JobClient [INFO]  map 0% reduce 0%
> >> 2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id :
> >> attempt_201403111332_0011_m_000000_0, Status : FAILED
> >> 2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException:
> >> /5/clusters-0
> >>          at
> >> org.apache.mahout.common.iterator.sequencefile.
> >> SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.
> >> java:78)
> >>          at
> >>
> org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(
> >> ClusterClassifier.java:208)
> >>          at
> >> org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
> >>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
> >>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.
> >> java:672)
> >>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
> >>          at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> >>          at java.security.AccessController.doPrivileged(Native Method)
> >>          at javax.security.auth.Subject.doAs(Subject.java:415)
> >>          at
> >> org.apache.hadoop.security.UserGroupInformation.doAs(
> >> UserGroupInformation.java:1438)
> >>          at org.apache.hadoop.mapred.Child.main(Child.java:262)
> >> Caused by: java.io.FileNotFoundException: File /5/clusters-0
> >>
> >> Please suggest!!!
> >>
> >>
> >>
> >
>
>
> --
> Thanks & Regards
> Bikash Kumar Gupta
>
>
>
>
> --
> Thanks & Regards
> Bikash Kumar Gupta
>
>
>


-- 
Thanks & Regards
Bikash Kumar Gupta

Re: Problem with FileSystem in Kmeans

Reply via email to