Re: Problem with FileSystem in Kmeans

Suneel Marthi Mon, 17 Mar 2014 03:56:29 -0700

Have not seen that behavior with KMeans, what were ur settings again?
Sorry joining late onto this thread, hence have not looked at the entire 
history.






On Monday, March 17, 2014 6:52 AM, Bikash Gupta <bikash.gupt...@gmail.com> 
wrote:
 
Suneel,

Just for information, I havent found this issue in Canopy. Canopy cluster-0 was 
created in HDFS only.

However Kmeans cluster-0 was created in local file system and cluster-1 in HDFS 
and after that it spit an error as it was unable to locate cluster-0




On Mon, Mar 17, 2014 at 3:10 PM, Suneel Marthi <suneel_mar...@yahoo.com> wrote:

This problem's specifically to do with Canopy clustering and is not an issue 
with KMeans. I had seen this behavior with Canopy and looking at the code its 
indeed an issue wherein cluster-0 is created on the local file system and the 
remaining clusters land on HDFS.
>
>Please file a JIRA for this if not already done so.
>
>
>
>
>
>
>On Wednesday, March 12, 2014 3:02 AM, Bikash Gupta <bikash.gupt...@gmail.com> 
>wrote:
>
>Hi,
>
>Problem is not with input path, its the way Kmeans is getting executed. Let
>me explain.
>
>I have created CSV->Sequence using map-reduce hence my data is in HDFS
>After this I have run Canopy MR hence data is also in HDFS
>
>Now these two things are getting pushed in Kmeans MR.
>
>If you check KmeansDriver class, at first it tries to create cluster-0
>folder with data, here if you dont specify the scheme then it will write in
>local file system. After that MR job is getting started which is expecting
>cluster-0 in HDFS.
>
>Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);
>    ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta);
>    ClusterClassifier prior = new ClusterClassifier(clusters, policy);
>    prior.writeToSeqFiles(priorClustersPath);
>
>    if (runSequential) {
>      ClusterIterator.iterateSeq(conf, input, priorClustersPath, output,
>maxIterations);
>    } else {
>      ClusterIterator.iterateMR(conf, input, priorClustersPath, output,
>maxIterations);
>    }
>
>Let me know if I am not able to explain clearly.
>
>
>
>On Wed, Mar 12, 2014 at 11:53 AM, Sebastian Schelter <s...@apache.org> wrote:
>
>> Hi Bikash,
>>
>> Have you tried adding hdfs:// to your input path? Maybe that helps.
>>
>> --sebastian
>>
>>
>> On 03/11/2014 11:22 AM, Bikash Gupta wrote:
>>
>>> Hi,
>>>
>>> I am running Kmeans in cluster where I am setting the configuration of
>>> fs.hdfs.impl and fs.file.impl before hand as mentioned below
>>>
>
>>> conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.
>>> DistributedFileSystem.class.getName());
>>> conf.set("fs.file.impl",org.apache.hadoop.fs.
>>> LocalFileSystem.class.getName());
>>>
>
>>> Problem is that cluster-0 directory is getting created in local file
>>> system
>>> and cluster-1 is getting created in HDFS, and Kmeans map reduce job is
>>> unable to find cluster-0 . Please see below the stacktrace
>>>
>>> 2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments:
>>> {--clustering=null, --clusters=[/3/clusters-0-final],
>>> --convergenceDelta=[0.1],
>>> --distanceMeasure=[org.apache.mahout.common.distance.
>>> EuclideanDistanceMeasure],
>>> --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100],
>>> --method=[mapreduce], --output=[/5], --overwrite=null, --startPhase=[0],
>>> --tempDir=[temp]}
>>> 2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>>> where
>>> applicable
>>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence
>>> Clusters In: /3/clusters-0-final Out: /5
>>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max
>>> Iterations: 100
>>> 2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input paths
>>> to
>>> process : 3
>>> 2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job:
>>> job_201403111332_0011
>>> 2014-03-11 14:52:20 o.a.h.m.JobClient [INFO]  map 0% reduce 0%
>>> 2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id :
>>> attempt_201403111332_0011_m_000000_0, Status : FAILED
>>> 2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException:
>>> /5/clusters-0
>>>          at
>
>>> org.apache.mahout.common.iterator.sequencefile.
>>> SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.
>>> java:78)
>>>          at
>>> org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(
>
>>> ClusterClassifier.java:208)
>>>          at
>>> org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
>>>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
>>>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.
>>> java:672)
>>>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>>>          at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>          at java.security.AccessController.doPrivileged(Native Method)
>>>          at javax.security.auth.Subject.doAs(Subject.java:415)
>>>          at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(
>>> UserGroupInformation.java:1438)
>>>          at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>> Caused by: java.io.FileNotFoundException: File /5/clusters-0
>>>
>>> Please suggest!!!
>>>
>>>
>>>
>>
>
>
>--
>Thanks & Regards
>Bikash Kumar Gupta


-- 
Thanks & Regards
Bikash Kumar Gupta

Re: Problem with FileSystem in Kmeans

Reply via email to