RE: Handling regionserver crashes in production cluster

2013-09-02 Thread Sandeep L
Even we are facing same problem, is it fixed in hbase 0.94.8 or 0.97.6 ? If it is fixed we will migrate, can some one conform about this? Thanks,Sandeep. > From: nkey...@gmail.com > Date: Thu, 13 Jun 2013 09:00:46 +0200 > Subject: Re: Handling regionserver crashes in production cluster &

Re: Handling regionserver crashes in production cluster

2013-06-13 Thread Nicolas Liochon
Hum... So even a simple get shows the issue? It would be a (surprising) critical bug. Could you please try the 95.1 or the 94.8? Or write an unit test? Thanks, Nicolas On Thu, Jun 13, 2013 at 5:43 AM, kiran wrote: > Its a simple kill... > Scan is used using startrow and stoprow > Scan scan =

Re: Handling regionserver crashes in production cluster

2013-06-12 Thread kiran
Its a simple kill... Scan is used using startrow and stoprow Scan scan = new Scan(Bytes.toBytes("adidas"), Bytes.toBytes("adidas1")); Our cluster size is 15. The load average when I see in master is 78%...It is not that overloaded. but writes are happening in the cluster... Thanks Kiran On We

Re: Handling regionserver crashes in production cluster

2013-06-12 Thread Nicolas Liochon
Yeah, it should not block the other regions. For the region server, was it a kill -9 or in simple kill (the former triggers a recovery, the later will close the region before stopping the process)? How do you select the scan scope? With stop/start rows? Can you share the client code you're using?

Re: Handling regionserver crashes in production cluster

2013-06-12 Thread kiran
Yes we killed the region server but datanode is still running on the node... Sample Test scenario: Assume, I have table with pre-splits a upto z (about 26 regions). I brought down region server purposefully with regions having prefixes c and d. Then I used client API to scan data from regions with

Re: Handling regionserver crashes in production cluster

2013-06-12 Thread rajesh babu chintaguntla
You can configure below to more value to close more regions at a time. hbase.regionserver.executor.closeregion.threads 3 On Wed, Jun 12, 2013 at 7:38 PM, Nicolas Liochon wrote: > What was your test exactly? You killed -9 a region server but kept the > datanode alive? > Could you d

Re: Handling regionserver crashes in production cluster

2013-06-12 Thread Nicolas Liochon
What was your test exactly? You killed -9 a region server but kept the datanode alive? Could you detail the queries you were doing? On Wed, Jun 12, 2013 at 2:10 PM, kiran wrote: > It is not possible for us to migrate to new version immediately. > > @Anoop we purposefully brought down one region

Re: Handling regionserver crashes in production cluster

2013-06-12 Thread kiran
It is not possible for us to migrate to new version immediately. @Anoop we purposefully brought down one regionserver, then we observed the website is taking too much time to respond. We observed the pattern for about 5 min till the regions are relocated. Also we issued queries in our website taki

Re: Handling regionserver crashes in production cluster

2013-06-06 Thread Jean-Marc Spaggiari
Hi Kiran, Also, any chance for you to migrate to 0.94.8? There have been hundreds of fixes since 0.94.1... JM 2013/6/6 Anoop John : > How many total RS in the cluster? You mean u can not do any operation on > other regions in the live clusters? It should not happen.. Is it so > happening that

Re: Handling regionserver crashes in production cluster

2013-06-05 Thread Anoop John
How many total RS in the cluster? You mean u can not do any operation on other regions in the live clusters? It should not happen.. Is it so happening that the client ops are targetted at the regions which were in the dead RS( and in transition now)? Can u have a closer look and see? If not pl

Handling regionserver crashes in production cluster

2013-06-05 Thread kiran
Dear All, We have production cluster that runs on hbase 0.94.1. The issue we are facing is whenever one regionserver goes down, the cluster becomes unresponsive until all the regions are allocated to another regionserver(s). The transition is taking about 3-5 mins and during this time we are unabl