Re: Giraph job hangs indefinitely and is eventually killed by JobTracker

Vikesh Khanna Mon, 07 Apr 2014 02:46:26 -0700

Hi, 

Any ideas why Giraph waits indefinitely? I've been stuck on this for a long 
time now.


Thanks, 
Vikesh Khanna, 
Masters, Computer Science (Class of 2015) 
Stanford University 


----- Original Message -----

From: "Vikesh Khanna" <vik...@stanford.edu> 
To: user@giraph.apache.org 
Sent: Friday, April 4, 2014 6:06:51 AM 
Subject: Re: Giraph job hangs indefinitely and is eventually killed by 
JobTracker 

Hi Avery, 

I tried both the options. It does appear to be a GC problem. The problem 
continues with the second option as well :(. I have attached the logs after 
enabling the first set of options and using 1 worker. Would be very helpful if 
you can take a look. 

This machine has 1 TB memory. We ran benchmarks of various other graph 
libraries on this machine and they worked fine (even with graphs 10x larger 
than the Giraph PageRank benchmark - 40 million nodes). I am sure Giraph would 
work fine as well - this should not be a resource constraint. 

Thanks, 
Vikesh Khanna, 
Masters, Computer Science (Class of 2015) 
Stanford University 


----- Original Message -----

From: "Avery Ching" <ach...@apache.org> 
To: user@giraph.apache.org 
Sent: Thursday, April 3, 2014 7:26:56 PM 
Subject: Re: Giraph job hangs indefinitely and is eventually killed by 
JobTracker 

This is for a single worker it appears. Most likely your worker went into GC 
and never returned. You can try with GC settings turned on, try adding 
something like. 

-XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-verbose:gc 

You could also try the concurrent mark/sweep collector. 

-XX:+UseConcMarkSweepGC 

Any chance you can use more workers and/or get more memory? 

Avery 

On 4/3/14, 5:46 PM, Vikesh Khanna wrote: 



@Avery, 

Thanks for the help. I checked out the task logs, and turns out there was an 
exception "GC overhead limit exceeded" due to which the benchmarks wouldn't 
even load the vertices. I got around it by increasing the heap size 
(mapred.child.java.opts) in mapred-site.xml. The benchmark is loading vertices 
now. However, the job is still getting stuck indefinitely (and eventually 
killed). I have attached the small log for the map task on 1 worker. Would 
really appreciate if you can help understand the cause. 

Thanks, 
Vikesh Khanna, 
Masters, Computer Science (Class of 2015) 
Stanford University 


----- Original Message -----

From: "Praveen kumar s.k" <skpraveenkum...@gmail.com> 
To: user@giraph.apache.org 
Sent: Thursday, April 3, 2014 4:40:07 PM 
Subject: Re: Giraph job hangs indefinitely and is eventually killed by 
JobTracker 

You have given -w 30, make sure that that many number of map tasks are 
configured in your cluster 

On Thu, Apr 3, 2014 at 6:24 PM, Avery Ching <ach...@apache.org> wrote: 
> My guess is that you don't get your resources. It would be very helpful to 
> print the master log. You can find it when the job is running to look at 
> the Hadoop counters on the job UI page. 
> 
> Avery 
> 
> 
> On 4/3/14, 12:49 PM, Vikesh Khanna wrote: 
> 
> Hi, 
> 
> I am running the PageRank benchmark under giraph-examples from giraph-1.0.0 
> release. I am using the following command to run the job (as mentioned here) 
> 
> vikesh@madmax 
> /lfs/madmax/0/vikesh/usr/local/giraph/giraph-examples/src/main/java/org/apache/giraph/examples
>  
> $ $HADOOP_HOME/bin/hadoop jar 
> $GIRAPH_HOME/giraph-core/target/giraph-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar
>  
> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50000000 -w 30 
> 
> 
> However, the job gets stuck at map 9% and is eventually killed by the 
> JobTracker on reaching the mapred.task.timeout (default 10 minutes). I tried 
> increasing the timeout to a very large value, and the job went on for over 8 
> hours without completion. I also tried the ShortestPathsBenchmark, which 
> also fails the same way. 
> 
> 
> Any help is appreciated. 
> 
> 
> ****** ---------------- *********** 
> 
> 
> Machine details: 
> 
> Linux version 2.6.32-279.14.1.el6.x86_64 
> ( mockbu...@c6b8.bsys.dev.centos.org ) (gcc version 4.4.6 20120305 (Red Hat 
> 4.4.6-4) (GCC) ) #1 SMP Tue Nov 6 23:43:09 UTC 2012 
> 
> Architecture: x86_64 
> CPU op-mode(s): 32-bit, 64-bit 
> Byte Order: Little Endian 
> CPU(s): 64 
> On-line CPU(s) list: 0-63 
> Thread(s) per core: 1 
> Core(s) per socket: 8 
> CPU socket(s): 8 
> NUMA node(s): 8 
> Vendor ID: GenuineIntel 
> CPU family: 6 
> Model: 47 
> Stepping: 2 
> CPU MHz: 1064.000 
> BogoMIPS: 5333.20 
> Virtualization: VT-x 
> L1d cache: 32K 
> L1i cache: 32K 
> L2 cache: 256K 
> L3 cache: 24576K 
> NUMA node0 CPU(s): 1-8 
> NUMA node1 CPU(s): 9-16 
> NUMA node2 CPU(s): 17-24 
> NUMA node3 CPU(s): 25-32 
> NUMA node4 CPU(s): 0,33-39 
> NUMA node5 CPU(s): 40-47 
> NUMA node6 CPU(s): 48-55 
> NUMA node7 CPU(s): 56-63 
> 
> 
> I am using a pseudo-distributed Hadoop cluster on a single machine with 
> 64-cores. 
> 
> 
> *****-------------******* 
> 
> 
> Thanks, 
> Vikesh Khanna, 
> Masters, Computer Science (Class of 2015) 
> Stanford University 
> 
> 
>

Re: Giraph job hangs indefinitely and is eventually killed by JobTracker

Reply via email to