What was your test exactly? You killed -9 a region server but kept the datanode alive? Could you detail the queries you were doing?
On Wed, Jun 12, 2013 at 2:10 PM, kiran <kiran.sarvabho...@gmail.com> wrote: > It is not possible for us to migrate to new version immediately. > > @Anoop we purposefully brought down one regionserver, then we observed the > website is taking too much time to respond. We observed the pattern for > about 5 min till the regions are relocated. > Also we issued queries in our website taking care that the queries did n't > come under the regions in the regionserver we brought down. > > Is there any configuration workaround to mitigate it?? > > Thanks > Kiran > > > > On Thu, Jun 6, 2013 at 8:27 PM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org > > wrote: > > > Hi Kiran, > > > > Also, any chance for you to migrate to 0.94.8? There have been > > hundreds of fixes since 0.94.1... > > > > JM > > > > 2013/6/6 Anoop John <anoop.hb...@gmail.com>: > > > How many total RS in the cluster? You mean u can not do any operation > on > > > other regions in the live clusters? It should not happen.. Is it so > > > happening that the client ops are targetted at the regions which were > in > > > the dead RS( and in transition now)? Can u have a closer look and > see? > > > If not pls check the RS threads were they are getting blocked. > > > > > > -Anoop- > > > > > > On Wed, Jun 5, 2013 at 10:50 PM, kiran <kiran.sarvabho...@gmail.com> > > wrote: > > > > > >> Dear All, > > >> > > >> We have production cluster that runs on hbase 0.94.1. The issue we are > > >> facing is whenever one regionserver goes down, the cluster becomes > > >> unresponsive until all the regions are allocated to another > > >> regionserver(s). The transition is taking about 3-5 mins and during > this > > >> time we are unable to any do client operation on the cluster. > > >> > > >> Is there any way we can make the transition to run in background ? > > >> > > >> Also, it is acceptable for us if the client operations such as scan or > > get > > >> does not work on the rowkeys of regions in transition. But, they are > not > > >> working on the entire cluster until all the regions are moved out of > > >> transition. We can't afford 3-5 minutes of downtime. > > >> > > >> -- > > >> Thank you > > >> Kiran Sarvabhotla > > >> > > >> -----Even a correct decision is wrong when it is taken late > > >> > > > > > > -- > Thank you > Kiran Sarvabhotla > > -----Even a correct decision is wrong when it is taken late >