Hi,
|Try making and analyzing memory dump after exception (JVM param -XX:+HeapDumpOnOutOfMemoryError|)
What configuration (mainly Partition class) do you use ?
Lukas

On 7.4.2014 11:45, Vikesh Khanna wrote:
Hi,

Any ideas why Giraph waits indefinitely? I've been stuck on this for a long time now.

Thanks,
Vikesh Khanna,
Masters, Computer Science (Class of 2015)
Stanford University


------------------------------------------------------------------------
*From: *"Vikesh Khanna" <vik...@stanford.edu>
*To: *user@giraph.apache.org
*Sent: *Friday, April 4, 2014 6:06:51 AM
*Subject: *Re: Giraph job hangs indefinitely and is eventually killed by JobTracker

Hi Avery,

I tried both the options. It does appear to be a GC problem. The problem continues with the second option as well :(. I have attached the logs after enabling the first set of options and using 1 worker. Would be very helpful if you can take a look.

This machine has 1 TB memory. We ran benchmarks of various other graph libraries on this machine and they worked fine (even with graphs 10x larger than the Giraph PageRank benchmark - 40 million nodes). I am sure Giraph would work fine as well - this should not be a resource constraint.

Thanks,
Vikesh Khanna,
Masters, Computer Science (Class of 2015)
Stanford University


------------------------------------------------------------------------
*From: *"Avery Ching" <ach...@apache.org>
*To: *user@giraph.apache.org
*Sent: *Thursday, April 3, 2014 7:26:56 PM
*Subject: *Re: Giraph job hangs indefinitely and is eventually killed by JobTracker

This is for a single worker it appears. Most likely your worker went into GC and never returned. You can try with GC settings turned on, try adding something like.

-XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc

You could also try the concurrent mark/sweep collector.

-XX:+UseConcMarkSweepGC

Any chance you can use more workers and/or get more memory?

Avery

On 4/3/14, 5:46 PM, Vikesh Khanna wrote:

    @Avery,

    Thanks for the help. I checked out the task logs, and turns out
    there was an exception  "GC overhead limit exceeded" due to which
    the benchmarks wouldn't even load the vertices. I got around it by
    increasing the heap size (mapred.child.java.opts) in
    mapred-site.xml. The benchmark is loading vertices now. However,
    the job is still getting stuck indefinitely (and eventually
    killed). I have attached the small log for the map task on 1
    worker. Would really appreciate if you can help understand the cause.

    Thanks,
    Vikesh Khanna,
    Masters, Computer Science (Class of 2015)
    Stanford University


    ------------------------------------------------------------------------
    *From: *"Praveen kumar s.k" <skpraveenkum...@gmail.com>
    *To: *user@giraph.apache.org
    *Sent: *Thursday, April 3, 2014 4:40:07 PM
    *Subject: *Re: Giraph job hangs indefinitely and is eventually
    killed by JobTracker

    You have given -w 30, make sure that that many number of map tasks are
    configured in your cluster

    On Thu, Apr 3, 2014 at 6:24 PM, Avery Ching <ach...@apache.org> wrote:
    > My guess is that you don't get your resources.  It would be very
    helpful to
    > print the master log.  You can find it when the job is running
    to look at
    > the Hadoop counters on the job UI page.
    >
    > Avery
    >
    >
    > On 4/3/14, 12:49 PM, Vikesh Khanna wrote:
    >
    > Hi,
    >
    > I am running the PageRank benchmark under giraph-examples from
    giraph-1.0.0
    > release. I am using the following command to run the job (as
    mentioned here)
    >
    > vikesh@madmax
    >
    
/lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apache/giraph/examples
    > $ $HADOOP_HOME/bin/hadoop jar
    >
    
$GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
    > org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V
    50000000 -w 30
    >
    >
    > However, the job gets stuck at map 9% and is eventually killed
    by the
    > JobTracker on reaching the mapred.task.timeout (default 10
    minutes). I tried
    > increasing the timeout to a very large value, and the job went
    on for over 8
    > hours without completion. I also tried the
    ShortestPathsBenchmark, which
    > also fails the same way.
    >
    >
    > Any help is appreciated.
    >
    >
    > ****** ---------------- ***********
    >
    >
    > Machine details:
    >
    > Linux version 2.6.32-279.14.1.el6.x86_64
    > (mockbu...@c6b8.bsys.dev.centos.org) (gcc version 4.4.6 20120305
    (Red Hat
    > 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC 2012
    >
    > Architecture: x86_64
    > CPU op-mode(s): 32-bit, 64-bit
    > Byte Order: Little Endian
    > CPU(s): 64
    > On-line CPU(s) list: 0-63
    > Thread(s) per core: 1
    > Core(s) per socket: 8
    > CPU socket(s): 8
    > NUMA node(s): 8
    > Vendor ID: GenuineIntel
    > CPU family: 6
    > Model: 47
    > Stepping: 2
    > CPU MHz: 1064.000
    > BogoMIPS: 5333.20
    > Virtualization: VT-x
    > L1d cache: 32K
    > L1i cache: 32K
    > L2 cache: 256K
    > L3 cache: 24576K
    > NUMA node0 CPU(s): 1-8
    > NUMA node1 CPU(s): 9-16
    > NUMA node2 CPU(s): 17-24
    > NUMA node3 CPU(s): 25-32
    > NUMA node4 CPU(s): 0,33-39
    > NUMA node5 CPU(s): 40-47
    > NUMA node6 CPU(s): 48-55
    > NUMA node7 CPU(s): 56-63
    >
    >
    > I am using a pseudo-distributed Hadoop cluster on a single
    machine with
    > 64-cores.
    >
    >
    > *****-------------*******
    >
    >
    > Thanks,
    > Vikesh Khanna,
    > Masters, Computer Science (Class of 2015)
    > Stanford University
    >
    >
    >





Reply via email to