I have a map reduce program that do some matrix operations. in the reducer, it will average many large matrix(each matrix takes up 400+MB(said by Map output bytes). so if there 50 matrix to a reducer, then the total memory usage is 20GB. so the reduce task got exception:
FATAL org.apache.hadoop.mapred.Child: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344) at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406) at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:438) at org.apache.hadoop.mapred.Merger.merge(Merger.java:142) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2539) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:661) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) one method I can come up with is use Combiner to save sums of some matrixs and their count but it still can solve the problem because the combiner is not fully controled by me.