Re: Hadoop JobTracker Hanging

2010-06-23 Thread Bobby Dennett
Thanks for the latest round of suggestions. We will definitely check out compressed object pointers and are looking into what we can do regarding the JT history. As I mentioned previously, we are working on getting stronger servers for the NN/JT node and the secondary NN node (similar to worka

Re: Hadoop JobTracker Hanging

2010-06-22 Thread Rahul Jain
There are two issues which were fixed in 0.21.0 and can cause job tracker to run out of memory: https://issues.apache.org/jira/browse/MAPREDUCE-1316 and https://issues.apache.org/jira/browse/MAPREDUCE-841 We've been hit by MAPREDUCE-841 (large jobConf objects with large number of tasks, espec

Re: Hadoop JobTracker Hanging

2010-06-22 Thread Hemanth Yamijala
There was also https://issues.apache.org/jira/browse/MAPREDUCE-1316 whose cause hit clusters at Yahoo! very badly last year. The situation was particularly noticeable in the face of lots of jobs with failed tasks and a specific fix that enabled OutOfBand heartbeats. The latter (i.e. the OOB heartbe

Re: Hadoop JobTracker Hanging

2010-06-22 Thread Allen Wittenauer
On Jun 22, 2010, at 3:17 AM, Steve Loughran wrote: > > I'm surprised its the JT that is OOM-ing, anecdotally its the NN and 2ary NN > that use more, especially if the files are many and the blocksize small. the > JT should not be tracking that much data over time Pre-0.20.2, there are definite

Re: Hadoop JobTracker Hanging

2010-06-22 Thread James Seigel
+1 for compressed pointers. Sent from my mobile. Please excuse the typos. On 2010-06-22, at 4:18 AM, Steve Loughran wrote: > Bobby Dennett wrote: >> Thanks all for your suggestions (please note that Tan is my co-worker; >> we are both working to try and resolve this issue)... we experienced >

Re: Hadoop JobTracker Hanging

2010-06-22 Thread Steve Loughran
Bobby Dennett wrote: Thanks all for your suggestions (please note that Tan is my co-worker; we are both working to try and resolve this issue)... we experienced another hang this weekend and increased the HADOOP_HEAPSIZE setting to 6000 (MB) as we do periodically see "java.lang.OutOfMemoryError:

Re: Hadoop JobTracker Hanging

2010-06-21 Thread Ted Yu
> I will try to increase the HADOOP_HEAPSIZE to see if that helps. > > Tan > > > > -Original Message- > > From: Todd Lipcon [mailto:t...@cloudera.com] > > Sent: Thursday, June 17, 2010 5:07 PM > > To: common-user@hadoop.apache.org > >

Re: Hadoop JobTracker Hanging

2010-06-21 Thread James Seigel
>> Todd, >> I will try to increase the HADOOP_HEAPSIZE to see if that helps. >> Tan >> >> -Original Message- >> From: Todd Lipcon [mailto:t...@cloudera.com] >> Sent: Thursday, June 17, 2010 5:07 PM >> To: common-user@hadoop.apache.org >>

RE: Hadoop JobTracker Hanging

2010-06-21 Thread Bobby Dennett
ll try to increase the HADOOP_HEAPSIZE to see if that helps. > Tan > > -Original Message- > From: Todd Lipcon [mailto:t...@cloudera.com] > Sent: Thursday, June 17, 2010 5:07 PM > To: common-user@hadoop.apache.org > Subject: Re: Hadoop JobTracker Hanging > > Li,

RE: Hadoop JobTracker Hanging

2010-06-18 Thread Li, Tan
Todd, I will try to increase the HADOOP_HEAPSIZE to see if that helps. Tan -Original Message- From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Thursday, June 17, 2010 5:07 PM To: common-user@hadoop.apache.org Subject: Re: Hadoop JobTracker Hanging Li, just to narrow your search, in my

RE: Hadoop JobTracker Hanging

2010-06-18 Thread Li, Tan
Thanks for your suggestions, James. I will try that. Tan -Original Message- From: James Seigel [mailto:ja...@tynt.com] Sent: Thursday, June 17, 2010 6:21 PM To: common-user@hadoop.apache.org Subject: Re: Hadoop JobTracker Hanging Up the memory from the default to about 4x the default

Re: Hadoop JobTracker Hanging

2010-06-17 Thread James Seigel
Up the memory from the default to about 4x the default (heap setting). This should make it better I’d think! We’d been having the same issue...I believe this fixed it. James On 2010-06-17, at 3:00 PM, Li, Tan wrote: > Folks, > > I need some help on job tracker. > I am running a two hadoop cl

Re: Hadoop JobTracker Hanging

2010-06-17 Thread Todd Lipcon
Yu [mailto:yuzhih...@gmail.com] > Sent: Thursday, June 17, 2010 2:39 PM > To: common-user@hadoop.apache.org > Subject: Re: Hadoop JobTracker Hanging > > Is upgrading to hadoop-0.20.2+228 possible ? > > Use jstack to get stack trace of job tracker process when this happens > agai

RE: Hadoop JobTracker Hanging

2010-06-17 Thread Li, Tan
: Thursday, June 17, 2010 2:39 PM To: common-user@hadoop.apache.org Subject: Re: Hadoop JobTracker Hanging Is upgrading to hadoop-0.20.2+228 possible ? Use jstack to get stack trace of job tracker process when this happens again. Use jmap to get shared object memory maps or heap memory details. On

RE: Hadoop JobTracker Hanging

2010-06-17 Thread Li, Tan
Thanks, Todd. I will try that and let you know the result. Tan -Original Message- From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Thursday, June 17, 2010 2:41 PM To: common-user@hadoop.apache.org Subject: Re: Hadoop JobTracker Hanging +1, jstack is crucial to solve these kinds of

Re: Hadoop JobTracker Hanging

2010-06-17 Thread Todd Lipcon
+1, jstack is crucial to solve these kinds of issues. Also, which scheduler are you using? Thanks -Todd On Thu, Jun 17, 2010 at 2:38 PM, Ted Yu wrote: > Is upgrading to hadoop-0.20.2+228 possible ? > > Use jstack to get stack trace of job tracker process when this happens > again. > Use jmap to

Re: Hadoop JobTracker Hanging

2010-06-17 Thread Ted Yu
Is upgrading to hadoop-0.20.2+228 possible ? Use jstack to get stack trace of job tracker process when this happens again. Use jmap to get shared object memory maps or heap memory details. On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan wrote: > Folks, > > I need some help on job tracker. > I am runni

Hadoop JobTracker Hanging

2010-06-17 Thread Li, Tan
Folks, I need some help on job tracker. I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is with version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68 (Cloudera). I have the same problem with both the clusters: the job tracker hangs almost once a day. Sympt