No never thought about that. I just figured out how to locate the server
for that table after you mentioned it. We'll have to keep an eye on it next
time we have a hotspot to see if it coincides with the hotspot server.

What would be the theory for how it could become a hotspot? Isn't the
client supposed to cache it and only go back for a refresh if it hits a
region that is not in its expected location?

----
Saad


On Thu, Dec 1, 2016 at 2:56 PM, John Leach <jle...@splicemachine.com> wrote:

> Saad,
>
> Did you validate that Meta is not on the “Hot” region server?
>
> Regards,
> John Leach
>
>
>
> > On Dec 1, 2016, at 1:50 PM, Saad Mufti <saad.mu...@gmail.com> wrote:
> >
> > Hi,
> >
> > We are using HBase 1.0 on CDH 5.5.2 . We have taken great care to avoid
> > hotspotting due to inadvertent data patterns by prepending an MD5 based 4
> > digit hash prefix to all our data keys. This works fine most of the
> times,
> > but more and more (as much as once or twice a day) recently we have
> > occasions where one region server suddenly becomes "hot" (CPU above or
> > around 95% in various monitoring tools). When it happens it lasts for
> > hours, occasionally the hotspot might jump to another region server as
> the
> > master decide the region is unresponsive and gives its region to another
> > server.
> >
> > For the longest time, we thought this must be some single rogue key in
> our
> > input data that is being hammered. All attempts to track this down have
> > failed though, and the following behavior argues against this being
> > application based:
> >
> > 1. plotted Get and Put rate by region on the "hot" region server in
> > Cloudera Manager Charts, shows no single region is an outlier.
> >
> > 2. cleanly restarting just the region server process causes its regions
> to
> > randomly migrate to other region servers, then it gets new ones from the
> > HBase master, basically a sort of shuffling, then the hotspot goes away.
> If
> > it were application based, you'd expect the hotspot to just jump to
> another
> > region server.
> >
> > 3. have pored through region server logs and can't see anything out of
> the
> > ordinary happening
> >
> > The only other pertinent thing to mention might be that we have a special
> > process of our own running outside the cluster that does cluster wide
> major
> > compaction in a rolling fashion, where each batch consists of one region
> > from each region server, and it waits before one batch is completely done
> > before starting another. We have seen no real impact on the hotspot from
> > shutting this down and in normal times it doesn't impact our read or
> write
> > performance much.
> >
> > We are at our wit's end, anyone have experience with a scenario like
> this?
> > Any help/guidance would be most appreciated.
> >
> > -----
> > Saad
>
>

Reply via email to