Hi,

I have a custom pagerank computation with inputs reading from Hbase and writing to it.

I submit my job on a real distributed Hadoop cluster which can allocate 320 map jobs. I started my job with 100 workers. What I see is that only one of the workers are actually reading the input and gets out of memory:

readVertexInputSplit: Loaded 2000000 vertices at 23706.551705289407 vertices/sec 10945068 edges at 129730.61549455763 edges/sec Memory (free/total/max) = 124.31M / 910.25M / 910.25M

Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: 
GC overhead limit exceeded
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)
        at java.util.concurrent.FutureTask.get(FutureTask.java:119)
        at 
org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300)
        at 
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173)
        ... 16 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:2694)
        at java.lang.String.<init>(String.java:203)
        at java.lang.String.substring(String.java:1913)
        at java.net.URLStreamHandler.parseURL(URLStreamHandler.java:171)
        at java.net.URL.<init>(URL.java:614)
        at java.net.URL.<init>(URL.java:482)


and the rest of the workers say:

startSuperstep: WORKER_ONLY - Attempt=0, Superstep=-1

Master says:
MASTER_ONLY - 99 finished out of 100 on superstep -1

What configuration should I solve this problem?
I use:
        giraphConf.setWorkerConfiguration(1, 100, 85.0f);


Thanks,

Reply via email to