Re: region server dead and datanode block movement error

2014-02-27 Thread Jean-Marc Spaggiari
Hi Rohit, Usually YouAreDeadException is when your RegionServer is to slow. It gets kicked out by Master+ZK but then try to join back and get informed it has bene kicked out. Reasons: - Long Gargabe Collection; - Swapping; - Network issues (get disconnected, then re-connected); - etc. what do

Re: region server dead and datanode block movement error

2014-02-27 Thread Rohit Kelkar
Hi, has anybody been facing similar issues? - R On Wed, Feb 26, 2014 at 12:55 PM, Rohit Kelkar rohitkel...@gmail.comwrote: We are running hbase 0.94.2 on hadoop 0.20 append version in production (yes we have plans to upgrade hadoop). Its a 5 node cluster and a 6th node running just the name

Re: region server dead and datanode block movement error

2014-02-27 Thread Rohit Kelkar
Hi Jean-Marc, I have updated the RS log here (http://pastebin.com/bVDvMvrB) with events before 13:41:00. In the log I see a few responseTooSlow warnings at 13:34:00, 13:36:00. Then no activity till 13:41:00. At 13:41:00 there is a Sleeper warning - WARN org.apache.hadoop.hbase.util.Sleeper: We

Re: region server dead and datanode block movement error

2014-02-27 Thread Jean-Marc Spaggiari
2014-02-21 13:36:27,496 WARN org.apache.hadoop.ipc.HBaseServer: (responseTooSlow): {processingtimems:41236,call:next(-8680499896692404689, 1), rpc version=1, client version=29, methodsFingerPrint=54742778,client: 10.0.0.96:46618

Re: region server dead and datanode block movement error

2014-02-27 Thread Rohit Kelkar
Hi Jean-Marc, Each node has 48GB RAM To isolate and debug the RS failure issue, we have switched off all other tools. The only processes running are - DN = 4GB - RS = 6GB - TT = 4GB - num mappers available on the node = 4 * 4GB = 16GB - num reducers available on the node = 2 * 4GB = 8GB - 4 other

Re: region server dead and datanode block movement error

2014-02-27 Thread Rohit Kelkar
Oh yes and forgot to add the ZK process ZK = 5GB Total = 45GB On Thu, Feb 27, 2014 at 11:01 AM, Rohit Kelkar rohitkel...@gmail.comwrote: Hi Jean-Marc, Each node has 48GB RAM To isolate and debug the RS failure issue, we have switched off all other tools. The only processes running are -

Re: region server dead and datanode block movement error

2014-02-27 Thread Jean-Marc Spaggiari
so you might want to get some metrics over time, like using Ganglia or anything else. To track memory usage and network availability. are you often facing this issue? Is it easy for you to reproduce it? 2014-02-27 12:05 GMT-05:00 Rohit Kelkar rohitkel...@gmail.com: Oh yes and forgot to add

Re: region server dead and datanode block movement error

2014-02-27 Thread Rohit Kelkar
Yes. For the same conditions (dataset size, etc) the issue occurred 4 out of 5 times. Brought the region server down with a YouAreDeadException. Thats why I started digging into the DN and NN logs etc. And could see a common pattern as mentioned in my first mail. - R On Thu, Feb 27, 2014 at

region server dead and datanode block movement error

2014-02-26 Thread Rohit Kelkar
We are running hbase 0.94.2 on hadoop 0.20 append version in production (yes we have plans to upgrade hadoop). Its a 5 node cluster and a 6th node running just the name node and hmaster. I am seeing frequent RS YouAreDeadExceptions. Logs here http://pastebin.com/44aFyYZV The RS log shows a