Hi,
I have a custom pagerank computation with inputs reading from Hbase and
writing to it.
I submit my job on a real distributed Hadoop cluster which can allocate
320 map jobs. I started my job with 100 workers. What I see is that only
one of the workers are actually reading the input and gets out of memory:
readVertexInputSplit: Loaded 2000000 vertices at 23706.551705289407
vertices/sec 10945068 edges at 129730.61549455763 edges/sec Memory
(free/total/max) = 124.31M / 910.25M / 910.25M
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError:
GC overhead limit exceeded
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)
at java.util.concurrent.FutureTask.get(FutureTask.java:119)
at
org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300)
at
org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173)
... 16 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.lang.String.substring(String.java:1913)
at java.net.URLStreamHandler.parseURL(URLStreamHandler.java:171)
at java.net.URL.<init>(URL.java:614)
at java.net.URL.<init>(URL.java:482)
and the rest of the workers say:
startSuperstep: WORKER_ONLY - Attempt=0, Superstep=-1
Master says:
MASTER_ONLY - 99 finished out of 100 on superstep -1
What configuration should I solve this problem?
I use:
giraphConf.setWorkerConfiguration(1, 100, 85.0f);
Thanks,