Re: Handling regionserver crashes in production cluster

Nicolas Liochon Wed, 12 Jun 2013 07:10:25 -0700

What was your test exactly? You killed -9 a region server but kept the
datanode alive?
Could you detail the queries you were doing?



On Wed, Jun 12, 2013 at 2:10 PM, kiran <kiran.sarvabho...@gmail.com> wrote:

> It is not possible for us to migrate to new version immediately.
>
> @Anoop we purposefully brought down one regionserver, then we observed the
> website is taking too much time to respond. We observed the pattern for
> about 5 min till the regions are relocated.
> Also we issued queries in our website taking care that the queries did n't
> come under the regions in the regionserver we brought down.
>
> Is there any configuration workaround to mitigate it??
>
> Thanks
> Kiran
>
>
>
> On Thu, Jun 6, 2013 at 8:27 PM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org
> > wrote:
>
> > Hi Kiran,
> >
> > Also, any chance for you to migrate to 0.94.8? There have been
> > hundreds of fixes since 0.94.1...
> >
> > JM
> >
> > 2013/6/6 Anoop John <anoop.hb...@gmail.com>:
> > > How many total RS in the cluster?  You mean u can not do any operation
> on
> > > other regions in the live clusters?  It should not happen..  Is it so
> > > happening that the client ops are targetted at the regions which were
> in
> > > the dead RS( and in transition now)?   Can u have a closer look and
> see?
> > > If not pls check the RS threads were they are getting blocked.
> > >
> > > -Anoop-
> > >
> > > On Wed, Jun 5, 2013 at 10:50 PM, kiran <kiran.sarvabho...@gmail.com>
> > wrote:
> > >
> > >> Dear All,
> > >>
> > >> We have production cluster that runs on hbase 0.94.1. The issue we are
> > >> facing is whenever one regionserver goes down, the cluster becomes
> > >> unresponsive until all the regions are allocated to another
> > >> regionserver(s). The transition is taking about 3-5 mins and during
> this
> > >> time we are unable to any do client operation on the cluster.
> > >>
> > >> Is there any way we can make the transition to run in background ?
> > >>
> > >> Also, it is acceptable for us if the client operations such as scan or
> > get
> > >> does not work on the rowkeys of regions in transition. But, they are
> not
> > >> working on the entire cluster until all the regions are moved out of
> > >> transition. We can't afford 3-5 minutes of downtime.
> > >>
> > >> --
> > >> Thank you
> > >> Kiran Sarvabhotla
> > >>
> > >> -----Even a correct decision is wrong when it is taken late
> > >>
> >
>
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>

Re: Handling regionserver crashes in production cluster

Reply via email to