We had a problem like this once. We localized it to a fuzzy row filter request. Are you scanning this region with such a filter?
If so, there is a patch out there we applied that got rid of the problem. On Jul 8, 2016 2:58 AM, "Samir Ahmic" <ahmic.sa...@gmail.com> wrote: > Hi Sandeep, > > What sort of load is on cluster when this is happening ? What logs say in > moment when cluster is this state ? It is strange that whole cluster > is unresponsive i can remember few cases when this is possible: > > - "hbase:meta" table is in transition or unavailable > - zookeeper is overloaded with huge number of connections > - HDFS isues > > Check logs there must be explanation why you have such huge resource usage > spikes. > > Regards > Samir > > On Fri, Jul 8, 2016 at 8:06 AM, Sandeep Reddy <sandeepvre...@outlook.com> > wrote: > > > Hi, > > > > > > We are observing very high CPU load(400 to 600%) in one of our > > RegionServer to the point where the machine is becoming unresponsive. > > > > At this point the whole cluster of 20+ RegionServers becoming > unresponsive. > > > > Before cluster becomes unresponsive we observed following symptoms: > > > > * Huge bandwidth spike > > * CPU spikes vertically form normal load to very high usage only in > > one RegionServer > > * Few times even though machine is unresponsive, it sending > heartbeats > > to master > > * There is no spike in number of requests to HBase > > * We are observed this pattern at least twice is last week > > * We don't have any co-processors in any of the region servers > > > > What could be the possible reasons for this kind of behaviour. > > > > We are using hbase-0.98.7, hadoop-2.5.1 versions. > > Its production cluster so upgrading to latest version will not be > possible > > right away. > > > > > > Thanks, > > Sandeep. > > > > > > > > Thanks, > > Sandeep. > > >