The Hadoop 0.21 patch introduces a lot of changes that make the Random Forest code crash. I fixed the _SUCCESS file problem easily but was faced with another exception that is not that easy to fix.
________________________________ De : praneet mhatre <[email protected]> À : [email protected] Envoyé le : Vendredi 3 Juin 2011 18h04 Objet : Re: Reg Randomn forest Hi, Even I faced the exact same problem and had a long exchange of emails with the Mahout folks regarding this. I'll link you to the mail archive to save them the trouble of going thru it all again: http://search.lucidimagination.com/search/document/ecbfb35f9e05706b/partial_implementation_of_random_forest#98cc8b90d38c0423. In a nutshell, CDH3 uses some patches from Hadoop 0.21 which a create a _SUCCEED file in the output path and the current code does not know how to deal with that file. I switched to an earlier version of Hadoop and everything worked perfectly. I don't know if this issue has been fixed yet. One of the developers could throw some light on that. Thanks, On Fri, Jun 3, 2011 at 4:15 AM, <[email protected]> wrote: > Hi, > > I tried to run Randomn forest for KDD data in the Hadoop cluster(CDH > version 3) and ended up with the following error during build forest:- > > Exception in thread "main" java.lang.IllegalStateException: > java.io.EOFException > at > org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63) > at > org.apache.mahout.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:173) > at > org.apache.mahout.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:121) > at org.apache.mahout.df.mapreduce.Builder.build(Builder.java:324) > at > org.apache.mahout.df.mapreduce.BuildForest.buildForest(BuildForest.java:195) > at > org.apache.mahout.df.mapreduce.BuildForest.run(BuildForest.java:159) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.df.mapreduce.BuildForest.main(BuildForest.java:239) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > Caused by: java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419) > at > org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58) > at > org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61) > ... 12 more > > Any help in resolving the above issue is greatly appreciated. > > Thanks and Regards, > Ranjit.C > -- Praneet Mhatre Graduate Student Donald Bren School of ICS University of California, Irvine
