Hi All,

I'm having a problem running a job on Hadoop. Using Mahout, I've been able to 
run several Bayesian classifiers and train and test them successfully on 
increasingly largedatasets. Now I'm working on a dataset of 100,000 documents 
(size 100MB). I've been able to train the classifier but when I try to 
'testclassifier' all the map tasks are failingwith a 'Caused by: 
java.lang.OutOfMemoryError: GC overhead limit exceeded' exception,before the 
job itself is 'Killed'. I have a small cluster of 3 machines but have plenty of 
memory and CPU power (3 x 16GB, 2.5GHz quad-core machines). I've tried setting 
'mapred.child.java.opts' flags up to 3GB in size (-Xms3G -Xmx3G) but still get 
the same error. I've also tried setting HADOOP_HEAPSIZE at values up to 3000 
but this made no difference. When the program is running I can use 'top' to see 
that although the CPUs are busy, memory usage rarely goes above 12GB and 
absolutely no swapping is taking place.
I saw the same exception where a program was spending so much time 
garbage-collecting (more then 90% of its time!) that the program was unable to 
progress and so threw the 'GC overhead limit exceeded' exception.  If I set 
(-XX:UseGCOverheadLimit) in the mapred.child.java.opts property then I see the 
same behaviour as before only a slightly different exception is thrown, 'Caused 
by: java.lang.OutOfMemoryError: Java heap space at 
java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:39)' I'm guessing my program 
is spending too much time garbage-collecting for it to progress, but how do I 
fix this ? I'm usingHadoop 0.20.2 and the latest Mahout snapshot version. All 
machines are running 64-bit Ubuntu and Java 6. Any help would be very much 
appreciated,
Ken







                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
  

Reply via email to