Canopy generation out of memory troubleshooting

Chih-Hsien Wu Mon, 02 Dec 2013 08:05:41 -0800

Hi All,  I posted up a Mahout canopy generation related troubleshoot
last week; however, I didn't get the problem solved. The message below
is the error I received. I'm trying to run canopy generation about 900
mb worth of information. There are estimated about 120,000 vectors.
I'm currently running this on Amazon EMR  two m1.xlarge instances.


I noticed that memory is not used fully on the hadoop machines right
before error occurred. I looked into tweaking namenode, datanode heap
sizes, and jvm for each task; but I still can't optimize the memory
usage on the machines. I'm currently setting "mapred.child.java.opts"
to 8gb while datanode_heap_size to 3g. I'm looking for explanation on
memory setting to solve this problem.


2013-11-27 16:15:01,6423 ERROR Client
fs/client/fileclient/cc/client.cc:3769 Thread: 140066469238528 rpc err
Connection timed out(110) 28.9 to 10.60.9.161:5660, fid
2051.559.132980, upd 0
2013-11-27 16:19:35,9552 ERROR Client
fs/client/fileclient/cc/client.cc:3769 Thread: 140066469238528 rpc err
Connection reset by peer(104) 28.122 to 10.167.5.31:5660, fid
2168.36.131182, upd 0
Error: Java heap space
Error: GC overhead limit exceeded
Error: GC overhead limit exceeded
attempt_201311271453_0362_m_000000_0: Exception in thread
"communication thread" java.lang.OutOfMemoryError: GC overhead limit
exceeded
attempt_201311271453_0362_m_000000_0:   at
java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:64)
attempt_201311271453_0362_m_000000_0:   at
java.lang.StringBuilder.<init>(StringBuilder.java:85)
Error: Java heap space
Error: Java heap space
Error: Java heap space
java.lang.InterruptedException: Canopy Job failed processing
maprfs:/user/hadoop/bottom/data/24
        at 
org.apache.mahout.clustering.canopy.CanopyDriver.buildClustersMR(CanopyDriver.java:361)
        at 
org.apache.mahout.clustering.canopy.CanopyDriver.buildClusters(CanopyDriver.java:246)
        at 
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:154)
        at 
org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:168)
        at 
clustering.TopDownClusteringDriver.TopDownClustering(TopDownClusteringDriver.java:78)
        at 
clustering.TopDownClusteringDriver.main(TopDownClusteringDriver.java:133)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)

Canopy generation out of memory troubleshooting

Reply via email to