I have a map reduce program that do some matrix operations. in the
reducer, it will average many large matrix(each matrix takes up
400+MB(said by Map output bytes). so if there 50 matrix to a reducer,
then the total memory usage is 20GB. so the reduce task got exception:

FATAL org.apache.hadoop.mapred.Child: Error running child :
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:344)
at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:406)
at org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:238)
at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:438)
at org.apache.hadoop.mapred.Merger.merge(Merger.java:142)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.createKVIterator(ReduceTask.java:2539)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$400(ReduceTask.java:661)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

one method I can come up with is use Combiner to save sums of some
matrixs and their count
but it still can solve the problem because the combiner is not fully
controled by me.

Reply via email to