Re: Giraph job hangs indefinitely and is eventually killed by JobTracker

Pankaj Malhotra Mon, 07 Apr 2014 04:11:34 -0700

In task1.txt
2014-04-04 00:48:43,575 INFO
org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit:
Loaded 48000000 vertices at 150388.2505243918 vertices/sec 48000000 edges
at 150389.47626503918 edges/sec Memory (free/total/max) = 1143.96M /
9102.25M / 9102.25M
IMHO the worker has used almost all the memory available and not much
memory is left for the process to continue.
Also you may try the "top"  or "free" command to see what is the memory
usage.


Regards,
Pankaj


On 7 April 2014 15:15, Vikesh Khanna <vik...@stanford.edu> wrote:

> Hi,
>
> Any ideas why Giraph waits indefinitely? I've been stuck on this for a
> long time now.
>
> Thanks,
> Vikesh Khanna,
> Masters, Computer Science (Class of 2015)
> Stanford University
>
>
> ------------------------------
> *From: *"Vikesh Khanna" <vik...@stanford.edu>
> *To: *user@giraph.apache.org
> *Sent: *Friday, April 4, 2014 6:06:51 AM
>
> *Subject: *Re: Giraph job hangs indefinitely and is eventually killed by
> JobTracker
>
> Hi Avery,
>
> I tried both the options. It does appear to be a GC problem. The problem
> continues with the second option as well :(. I have attached the logs after
> enabling the first set of options and using 1 worker. Would be very helpful
> if you can take a look.
>
> This machine has 1 TB memory. We ran benchmarks of various other graph
> libraries on this machine and they worked fine (even with graphs 10x larger
> than the Giraph PageRank benchmark - 40 million nodes). I am sure Giraph
> would work fine as well - this should not be a resource constraint.
>
> Thanks,
> Vikesh Khanna,
> Masters, Computer Science (Class of 2015)
> Stanford University
>
>
> ------------------------------
> *From: *"Avery Ching" <ach...@apache.org>
> *To: *user@giraph.apache.org
> *Sent: *Thursday, April 3, 2014 7:26:56 PM
> *Subject: *Re: Giraph job hangs indefinitely and is eventually killed by
> JobTracker
>
> This is for a single worker it appears.  Most likely your worker went into
> GC and never returned.  You can try with GC settings turned on, try adding
> something like.
>
> -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -verbose:gc
>
> You could also try the concurrent mark/sweep collector.
>
> -XX:+UseConcMarkSweepGC
>
> Any chance you can use more workers and/or get more memory?
>
> Avery
>
> On 4/3/14, 5:46 PM, Vikesh Khanna wrote:
>
>  @Avery,
>
>  Thanks for the help. I checked out the task logs, and turns out there
> was an exception  "GC overhead limit exceeded" due to which the benchmarks
> wouldn't even load the vertices. I got around it by increasing the heap
> size (mapred.child.java.opts) in mapred-site.xml. The benchmark is loading
> vertices now. However, the job is still getting stuck indefinitely (and
> eventually killed). I have attached the small log for the map task on 1
> worker. Would really appreciate if you can help understand the cause.
>
>  Thanks,
> Vikesh Khanna,
> Masters, Computer Science (Class of 2015)
> Stanford University
>
>
>  ------------------------------
> *From: *"Praveen kumar s.k" 
> <skpraveenkum...@gmail.com><skpraveenkum...@gmail.com>
> *To: *user@giraph.apache.org
> *Sent: *Thursday, April 3, 2014 4:40:07 PM
> *Subject: *Re: Giraph job hangs indefinitely and is eventually killed by
> JobTracker
>
>  You have given -w 30, make sure that that many number of map tasks are
> configured in your cluster
>
>  On Thu, Apr 3, 2014 at 6:24 PM, Avery Ching 
> <ach...@apache.org><ach...@apache.org>wrote:
> > My guess is that you don't get your resources.  It would be very helpful
> to
> > print the master log.  You can find it when the job is running to look at
> > the Hadoop counters on the job UI page.
> >
> > Avery
> >
> >
> > On 4/3/14, 12:49 PM, Vikesh Khanna wrote:
> >
> > Hi,
> >
> > I am running the PageRank benchmark under giraph-examples from
> giraph-1.0.0
> > release. I am using the following command to run the job (as mentioned
> here)
> >
> > vikesh@madmax
> >
> /lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apache/giraph/examples
> > $ $HADOOP_HOME/bin/hadoop jar
> >
> $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
> > org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50000000
> -w 30
> >
> >
> > However, the job gets stuck at map 9% and is eventually killed by the
> > JobTracker on reaching the mapred.task.timeout (default 10 minutes). I
> tried
> > increasing the timeout to a very large value, and the job went on for
> over 8
> > hours without completion. I also tried the ShortestPathsBenchmark, which
> > also fails the same way.
> >
> >
> > Any help is appreciated.
> >
> >
> > ****** ---------------- ***********
> >
> >
> > Machine details:
> >
> > Linux version 2.6.32-279.14.1.el6.x86_64
> > (mockbu...@c6b8.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red
> Hat
> > 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC 2012
> >
> > Architecture: x86_64
> > CPU op-mode(s): 32-bit, 64-bit
> > Byte Order: Little Endian
> > CPU(s): 64
> > On-line CPU(s) list: 0-63
> > Thread(s) per core: 1
> > Core(s) per socket: 8
> > CPU socket(s): 8
> > NUMA node(s): 8
> > Vendor ID: GenuineIntel
> > CPU family: 6
> > Model: 47
> > Stepping: 2
> > CPU MHz: 1064.000
> > BogoMIPS: 5333.20
> > Virtualization: VT-x
> > L1d cache: 32K
> > L1i cache: 32K
> > L2 cache: 256K
> > L3 cache: 24576K
> > NUMA node0 CPU(s): 1-8
> > NUMA node1 CPU(s): 9-16
> > NUMA node2 CPU(s): 17-24
> > NUMA node3 CPU(s): 25-32
> > NUMA node4 CPU(s): 0,33-39
> > NUMA node5 CPU(s): 40-47
> > NUMA node6 CPU(s): 48-55
> > NUMA node7 CPU(s): 56-63
> >
> >
> > I am using a pseudo-distributed Hadoop cluster on a single machine with
> > 64-cores.
> >
> >
> > *****-------------*******
> >
> >
> > Thanks,
> > Vikesh Khanna,
> > Masters, Computer Science (Class of 2015)
> > Stanford University
> >
> >
> >
>
>
>
>
>

Re: Giraph job hangs indefinitely and is eventually killed by JobTracker

Reply via email to