Have not seen that behavior with KMeans, what were ur settings again?
Sorry joining late onto this thread, hence have not looked at the entire 

On Monday, March 17, 2014 6:52 AM, Bikash Gupta <bikash.gupt...@gmail.com> 

Just for information, I havent found this issue in Canopy. Canopy cluster-0 was 
created in HDFS only.

However Kmeans cluster-0 was created in local file system and cluster-1 in HDFS 
and after that it spit an error as it was unable to locate cluster-0

On Mon, Mar 17, 2014 at 3:10 PM, Suneel Marthi <suneel_mar...@yahoo.com> wrote:

This problem's specifically to do with Canopy clustering and is not an issue 
with KMeans. I had seen this behavior with Canopy and looking at the code its 
indeed an issue wherein cluster-0 is created on the local file system and the 
remaining clusters land on HDFS.
>Please file a JIRA for this if not already done so.
>On Wednesday, March 12, 2014 3:02 AM, Bikash Gupta <bikash.gupt...@gmail.com> 
>Problem is not with input path, its the way Kmeans is getting executed. Let
>me explain.
>I have created CSV->Sequence using map-reduce hence my data is in HDFS
>After this I have run Canopy MR hence data is also in HDFS
>Now these two things are getting pushed in Kmeans MR.
>If you check KmeansDriver class, at first it tries to create cluster-0
>folder with data, here if you dont specify the scheme then it will write in
>local file system. After that MR job is getting started which is expecting
>cluster-0 in HDFS.
>Path priorClustersPath = new Path(output, Cluster.INITIAL_CLUSTERS_DIR);
>    ClusteringPolicy policy = new KMeansClusteringPolicy(convergenceDelta);
>    ClusterClassifier prior = new ClusterClassifier(clusters, policy);
>    prior.writeToSeqFiles(priorClustersPath);
>    if (runSequential) {
>      ClusterIterator.iterateSeq(conf, input, priorClustersPath, output,
>    } else {
>      ClusterIterator.iterateMR(conf, input, priorClustersPath, output,
>    }
>Let me know if I am not able to explain clearly.
>On Wed, Mar 12, 2014 at 11:53 AM, Sebastian Schelter <s...@apache.org> wrote:
>> Hi Bikash,
>> Have you tried adding hdfs:// to your input path? Maybe that helps.
>> --sebastian
>> On 03/11/2014 11:22 AM, Bikash Gupta wrote:
>>> Hi,
>>> I am running Kmeans in cluster where I am setting the configuration of
>>> fs.hdfs.impl and fs.file.impl before hand as mentioned below
>>> conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.
>>> DistributedFileSystem.class.getName());
>>> conf.set("fs.file.impl",org.apache.hadoop.fs.
>>> LocalFileSystem.class.getName());
>>> Problem is that cluster-0 directory is getting created in local file
>>> system
>>> and cluster-1 is getting created in HDFS, and Kmeans map reduce job is
>>> unable to find cluster-0 . Please see below the stacktrace
>>> 2014-03-11 14:52:15 o.a.m.c.AbstractJob [INFO] Command line arguments:
>>> {--clustering=null, --clusters=[/3/clusters-0-final],
>>> --convergenceDelta=[0.1],
>>> --distanceMeasure=[org.apache.mahout.common.distance.
>>> EuclideanDistanceMeasure],
>>> --endPhase=[2147483647], --input=[/2/sequence], --maxIter=[100],
>>> --method=[mapreduce], --output=[/5], --overwrite=null, --startPhase=[0],
>>> --tempDir=[temp]}
>>> 2014-03-11 14:52:15 o.a.h.u.NativeCodeLoader [WARN] Unable to load
>>> native-hadoop library for your platform... using builtin-java classes
>>> where
>>> applicable
>>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] Input: /2/sequence
>>> Clusters In: /3/clusters-0-final Out: /5
>>> 2014-03-11 14:52:15 o.a.m.c.k.KMeansDriver [INFO] convergence: 0.1 max
>>> Iterations: 100
>>> 2014-03-11 14:52:16 o.a.h.m.JobClient [WARN] Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 2014-03-11 14:52:17 o.a.h.m.l.i.FileInputFormat [INFO] Total input paths
>>> to
>>> process : 3
>>> 2014-03-11 14:52:19 o.a.h.m.JobClient [INFO] Running job:
>>> job_201403111332_0011
>>> 2014-03-11 14:52:20 o.a.h.m.JobClient [INFO]  map 0% reduce 0%
>>> 2014-03-11 14:52:28 o.a.h.m.JobClient [INFO] Task Id :
>>> attempt_201403111332_0011_m_000000_0, Status : FAILED
>>> 2014-03-11 14:52:28 STDIO [ERROR] java.lang.IllegalStateException:
>>> /5/clusters-0
>>>          at
>>> org.apache.mahout.common.iterator.sequencefile.
>>> SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.
>>> java:78)
>>>          at
>>> org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(
>>> ClusterClassifier.java:208)
>>>          at
>>> org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
>>>          at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:138)
>>>          at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.
>>> java:672)
>>>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>>>          at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>          at java.security.AccessController.doPrivileged(Native Method)
>>>          at javax.security.auth.Subject.doAs(Subject.java:415)
>>>          at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(
>>> UserGroupInformation.java:1438)
>>>          at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>> Caused by: java.io.FileNotFoundException: File /5/clusters-0
>>> Please suggest!!!
>Thanks & Regards
>Bikash Kumar Gupta

Thanks & Regards
Bikash Kumar Gupta 

Reply via email to