subject:"Long GC pause question"

Long GC pause question

2010-12-28 Thread ChingShen

Hi all, I encounter a problem about long gc pause cause the region server's local zookeeper client cannot send heartbeats, the session times out. But I want to know why the HBase master sends a MSG_REGIONSERVER_STOP op to region sever to stop its services rather than reinitialize a new zookeepe

Re: Long GC pause question

2010-12-28 Thread Stack

On Tue, Dec 28, 2010 at 6:59 PM, ChingShen wrote: > But I want to know why the HBase master sends a MSG_REGIONSERVER_STOP op to > region sever to stop its services rather than reinitialize a new zookeeper > client or restart region server? > Can I see more regionserver log? If session expired,

Re: Long GC pause question

2010-12-28 Thread ChingShen

Hi St.Ack, Please see the attach file, and there are 3 RS/DN/TT + 1 MS/NN/JT in my cluster. (Hadoop-0.20.2, HBase 0.20.6) Thanks. Shen On Wed, Dec 29, 2010 at 1:34 PM, Stack wrote: > On Tue, Dec 28, 2010 at 6:59 PM, ChingShen > wrote: > > But I want to know why the HBase master sends a MS

Re: Long GC pause question

2010-12-29 Thread Stack

OK. There is nothing enlightening there. There didn't seem to be master log in the attachment? I should have asked you include that. I see that one server thought the filesystem had gone away. Did you pull HDFS out from under it at around this time per chance? St.Ack On Tue, Dec 28, 2010 at 1

Re: Long GC pause question

2011-01-06 Thread Jean-Daniel Cryans

Shen, It's a design decision, and we historically preferred to let cluster managers decide whether they want to restart the processes that died or investigate why it has died then decide on what they want to do. You can easily write tools that will restart the region servers if they die, but the f

Re: Long GC pause question

2011-01-07 Thread ChingShen

Hi J-D, Yes, I run a MR job on my cluster, and when I set the MR configs as below that long gc pause is occurred. MR config: (4-core cpu per RS/DN/TT node) mapred.tasktracker.reduce.tasks.maximum = 3 mapred.tasktracker.map.tasks.maximum = 4 mapred.reduce.slowstart.completed.maps = 0.05

Re: Long GC pause question

2011-01-10 Thread Jean-Daniel Cryans

Your MR job is likely generating a lot of IO and possibly starving HBase while it's running (it would require some monitoring on your end to figure that out). Less tasks per machine will leave more breathing room, there's not that many ways to unload overloaded machines. J-D On Fri, Jan 7, 2011 a

Long GC pause question

Re: Long GC pause question

Re: Long GC pause question

Re: Long GC pause question

Re: Long GC pause question

Re: Long GC pause question

Re: Long GC pause question

7 matches

Site Navigation

Mail list logo

Footer information