My hadoop version is 1.0.1 and I didn't specify any parameter. 2012/7/16 Felix.徐 <ygnhz...@gmail.com>
> HI all, > I have written a MyCombineFileInputFormat extends from > CombineFileInputFormat , it can put multi files together into the same > inputsplit, it works fine for just a small amount of files. But if I try to > process 100,000 small files, the CombineFileInputFormat ran out of memory > while splitting input files: > > The stack trace is: > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2245) > at java.util.Arrays.copyOf(Arrays.java:2219) > at java.util.ArrayList.grow(ArrayList.java:213) > at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:187) > at java.util.ArrayList.addAll(ArrayList.java:532) > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getHosts(CombineFileInputFormat.java:568) > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:410) > at > org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217) > at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989) > at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981) > at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261) > at ac.ict.mapreduce.test.MR.main(MR.java:114) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > It seems that the rack-nodes mapping is too big? .... How to solve this > problem? thanks! > > >