Re: debugging a possible long garbage collection

2011-04-29 Thread Garrett Wu
Thanks for the info guys, it was all very helpful. I do have the mslab option enabled, but it looks like this was just a gc triggered by the load of log splitting as Stack points out. Glad to see distributed log splitting in 0.92. Garrett

Re: debugging a possible long garbage collection

2011-04-29 Thread Stack
On Thu, Apr 28, 2011 at 9:48 PM, Garrett Wu wrote: >   2. The master started splitting logs for the hadoop11 region. Splitting logs is a pretty intense operation. It could have run up your master heap and CPU such that it brought on a long GC pause. Look for a big gap in your logging just before

Re: debugging a possible long garbage collection

2011-04-29 Thread Michel Segel
+1 on Todd's GC stuff. I just implemented it and it looks like there's a bit of improvement. I'm still early on in my testing... Sent from a remote device. Please excuse any typos... Mike Segel On Apr 29, 2011, at 1:35 AM, Ted Dunning wrote: > Swap and gc are the usual culprits for this. >

Re: debugging a possible long garbage collection

2011-04-28 Thread Ted Dunning
Swap and gc are the usual culprits for this. Are you running a recent enough version to have Todd's wondrous mslab option? On Thu, Apr 28, 2011 at 9:48 PM, Garrett Wu wrote: > Some snippets from the logs are pasted below. Does anyone know what may > have caused this? Was the hang really a gar

debugging a possible long garbage collection

2011-04-28 Thread Garrett Wu
Our HBase master and region servers all came down during a quiet period today (no read or write traffic). After inspecting the logs, I think I've pieced together what might have happened: 1. One of the region servers (hadoop11) timed out and started closing it's regions and aborting. 2.