Re: Region Server lost response when doing BatchUpdate

2009-04-15 Thread Andrew Purtell
Hi, DFS trouble. Have you taken the recommended steps according to this Wiki page: http://wiki.apache.org/hadoop/Hbase/Troubleshooting ? Try the steps for #5, #6, and #7. And/or, try adding more data nodes to spread the load. Hope that helps, - Andy > 2009-04-14 16:17:08,718 INFO

Re: Region Server lost response when doing BatchUpdate

2009-04-14 Thread 11 Nov.
Hi all, The insert operation is still executing, but there are region servers getting down now and then. The log info shows that they are shutdown for different reasons. Here is another failed region server's log: 2009-04-14 16:17:08,718 INFO org.apache.hadoop.hbase.regionserver.HLog: removing

Re: Region Server lost response when doing BatchUpdate

2009-04-14 Thread 11 Nov.
hi JD, I tried your solution by upgrading hbase to 0.19.1 and applying the patch. The inserting mapreduce application has been running for more than half an hour, we lost one region server and here is the log on the lost region server: 2009-04-14 16:08:11,483 FATAL org.apache.hadoop.hbase.regi

Re: Region Server lost response when doing BatchUpdate

2009-04-14 Thread Andrew Purtell
I put up a v4 on this issue. You should use that one instead. Please be advised this is still experimental. - Andy > From: Jean-Daniel Cryans > Subject: Re: Region Server lost response when doing BatchUpdate > To: hbase-user@hadoop.apache.org > Date: Monday, April 13, 2009, 5:4

Re: Region Server lost response when doing BatchUpdate

2009-04-13 Thread 11 Nov.
hi Jean-Daniel, As you said, we were inserting data using sequential pattern, and if we use random pattern there would not be such prolem. I'm trying hbase 0.19.1 and the patch now. Thanks! 2009/4/13 Jean-Daniel Cryans > I see that your region server had 5188 store files in 121 store

Re: Region Server lost response when doing BatchUpdate

2009-04-13 Thread Jean-Daniel Cryans
I see that your region server had 5188 store files in 121 store, I'm 99% sure that it's the cause of your OOME. Luckily for you, we've been working on this issue since last week. What you should do : - Upgrade to HBase 0.19.1 - Apply the latest patch in https://issues.apache.org/jira/browse/HBASE