Re: Hot Region Server With No Hot Region

Ted Yu Sat, 03 Dec 2016 12:21:07 -0800

I took a look at the stack trace.

Region server log would give us more detail on the frequency and duration
of compactions.


Cheers

On Sat, Dec 3, 2016 at 7:39 AM, Jeremy Carroll <phobos...@gmail.com> wrote:

> I would check compaction, investigate throttling if it's causing high CPU.
>
> On Sat, Dec 3, 2016 at 6:20 AM Saad Mufti <saad.mu...@gmail.com> wrote:
>
> > No.
> >
> > ----
> > Saad
> >
> >
> > On Fri, Dec 2, 2016 at 3:27 PM, Ted Yu <ted...@yahoo.com.invalid> wrote:
> >
> > > Some how I couldn't access the pastebin (I am in China now).
> > > Did the region server showing hotspot host meta ?
> > > Thanks
> > >
> > >     On Friday, December 2, 2016 11:53 AM, Saad Mufti <
> > saad.mu...@gmail.com>
> > > wrote:
> > >
> > >
> > >  We're in AWS with D2.4xLarge instances. Each instance has 12
> independent
> > > spindles/disks from what I can tell.
> > >
> > > We have charted get_rate and mutate_rate by host and
> > >
> > > a) mutate_rate shows no real outliers
> > > b) read_rate shows the overall rate on the "hotspot" region server is a
> > bit
> > > higher than every other server, not severely but enough that it is a
> bit
> > > noticeable. But when we chart get_rate on that server by region, no one
> > > region stands out.
> > >
> > > get_rate chart by host:
> > >
> > > https://snag.gy/hmoiDw.jpg
> > >
> > > mutate_rate chart by host:
> > >
> > > https://snag.gy/jitdMN.jpg
> > >
> > > ----
> > > Saad
> > >
> > >
> > > ----
> > > Saad
> > >
> > >
> > > On Fri, Dec 2, 2016 at 2:34 PM, John Leach <jle...@splicemachine.com>
> > > wrote:
> > >
> > > > Here is what I see...
> > > >
> > > >
> > > > * Short Compaction Running on Heap
> > > > "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.
> > > > aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547" -
> > > Thread
> > > > t@242
> > > >    java.lang.Thread.State: RUNNABLE
> > > >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > compressSingleKeyValue(FastDiffDeltaEncoder.java:270)
> > > >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > internalEncode(FastDiffDeltaEncoder.java:245)
> > > >    at org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder.
> > > > encode(BufferedDataBlockEncoder.java:987)
> > > >    at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > encode(FastDiffDeltaEncoder.java:58)
> > > >    at org.apache.hadoop.hbase.io
> > .hfile.HFileDataBlockEncoderImpl.encode(
> > > > HFileDataBlockEncoderImpl.java:97)
> > > >    at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(
> > > > HFileBlock.java:866)
> > > >    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(
> > > > HFileWriterV2.java:270)
> > > >    at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(
> > > > HFileWriterV3.java:87)
> > > >    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.
> > > > append(StoreFile.java:949)
> > > >    at org.apache.hadoop.hbase.regionserver.compactions.
> > > > Compactor.performCompaction(Compactor.java:282)
> > > >    at org.apache.hadoop.hbase.regionserver.compactions.
> > > > DefaultCompactor.compact(DefaultCompactor.java:105)
> > > >    at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$
> > > > DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
> > > >    at org.apache.hadoop.hbase.regionserver.HStore.compact(
> > > > HStore.java:1233)
> > > >    at org.apache.hadoop.hbase.regionserver.HRegion.compact(
> > > > HRegion.java:1770)
> > > >    at org.apache.hadoop.hbase.regionserver.CompactSplitThread$
> > > > CompactionRunner.run(CompactSplitThread.java:520)
> > > >    at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > > ThreadPoolExecutor.java:1142)
> > > >    at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > > ThreadPoolExecutor.java:617)
> > > >    at java.lang.Thread.run(Thread.java:745)
> > > >
> > > >
> > > > * WAL Syncs waiting…  ALL 5
> > > > "sync.0" - Thread t@202
> > > >    java.lang.Thread.State: TIMED_WAITING
> > > >    at java.lang.Object.wait(Native Method)
> > > >    - waiting on <67ba892d> (a java.util.LinkedList)
> > > >    at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(
> > > > DFSOutputStream.java:2337)
> > > >    at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(
> > > > DFSOutputStream.java:2224)
> > > >    at org.apache.hadoop.hdfs.DFSOutputStream.hflush(
> > > > DFSOutputStream.java:2116)
> > > >    at org.apache.hadoop.fs.FSDataOutputStream.hflush(
> > > > FSDataOutputStream.java:130)
> > > >    at org.apache.hadoop.hbase.regionserver.wal.
> ProtobufLogWriter.sync(
> > > > ProtobufLogWriter.java:173)
> > > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog$
> > > > SyncRunner.run(FSHLog.java:1379)
> > > >    at java.lang.Thread.run(Thread.java:745)
> > > >
> > > > * Mutations backing up very badly...
> > > >
> > > > "B.defaultRpcServer.handler=103,queue=7,port=60020" - Thread t@155
> > > >    java.lang.Thread.State: TIMED_WAITING
> > > >    at java.lang.Object.wait(Native Method)
> > > >    - waiting on <6ab54ea3> (a org.apache.hadoop.hbase.
> > > > regionserver.wal.SyncFuture)
> > > >    at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.
> > > > get(SyncFuture.java:167)
> > > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
> > > > blockOnSync(FSHLog.java:1504)
> > > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.
> > > > publishSyncThenBlockOnCompletion(FSHLog.java:1498)
> > > >    at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(
> > > > FSHLog.java:1632)
> > > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > syncOrDefer(HRegion.java:7737)
> > > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > processRowsWithLocks(HRegion.java:6504)
> > > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > mutateRowsWithLocks(HRegion.java:6352)
> > > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > mutateRowsWithLocks(HRegion.java:6334)
> > > >    at org.apache.hadoop.hbase.regionserver.HRegion.
> > > > mutateRow(HRegion.java:6325)
> > > >    at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > mutateRows(RSRpcServices.java:418)
> > > >    at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > > > multi(RSRpcServices.java:1916)
> > > >    at org.apache.hadoop.hbase.protobuf.generated.
> > > >
> > ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
> > > >    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
> 2034)
> > > >    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
> java:107)
> > > >    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> > > > RpcExecutor.java:130)
> > > >    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > > java:107)
> > > >    at java.lang.Thread.run(Thread.java:745)
> > > >
> > > >
> > > > Too many writers being blocked attempting to write to WAL.
> > > >
> > > > What does your disk infrastructure look like?  Can you get away with
> > > > Multi-wal?  Ugh...
> > > >
> > > > Regards,
> > > > John Leach
> > > >
> > > >
> > > > > On Dec 2, 2016, at 1:20 PM, Saad Mufti <saad.mu...@gmail.com>
> wrote:
> > > > >
> > > > > Hi Ted,
> > > > >
> > > > > Finally we have another hotspot going on, same symptoms as before,
> > here
> > > > is
> > > > > the pastebin for the stack trace from the region server that I
> > obtained
> > > > via
> > > > > VisualVM:
> > > > >
> > > > > http://pastebin.com/qbXPPrXk
> > > > >
> > > > > Would really appreciate any insight you or anyone else can provide.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > ----
> > > > > Saad
> > > > >
> > > > >
> > > > > On Thu, Dec 1, 2016 at 6:08 PM, Saad Mufti <saad.mu...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Sure will, the next time it happens.
> > > > >>
> > > > >> Thanks!!!
> > > > >>
> > > > >> ----
> > > > >> Saad
> > > > >>
> > > > >>
> > > > >> On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu <ted...@yahoo.com.invalid>
> > > > wrote:
> > > > >>
> > > > >>> From #2 in the initial email, the hbase:meta might not be the
> cause
> > > for
> > > > >>> the hotspot.
> > > > >>>
> > > > >>> Saad:
> > > > >>> Can you pastebin stack trace of the hot region server when this
> > > happens
> > > > >>> again ?
> > > > >>>
> > > > >>> Thanks
> > > > >>>
> > > > >>>> On Dec 2, 2016, at 4:48 AM, Saad Mufti <saad.mu...@gmail.com>
> > > wrote:
> > > > >>>>
> > > > >>>> We used a pre-split into 1024 regions at the start but we
> > > > miscalculated
> > > > >>> our
> > > > >>>> data size, so there were still auto-splits storms at the
> beginning
> > > as
> > > > >>> data
> > > > >>>> size stabilized, it has ended up at around 9500 or so regions,
> > plus
> > > a
> > > > >>> few
> > > > >>>> thousand regions for a few other tables (much smaller). But
> > haven't
> > > > had
> > > > >>> any
> > > > >>>> new auto-splits in a couple of months. And the hotspots only
> > started
> > > > >>>> happening recently.
> > > > >>>>
> > > > >>>> Our hashing scheme is very simple, we take the MD5 of the key,
> > then
> > > > >>> form a
> > > > >>>> 4 digit prefix based on the first two bytes of the MD5
> normalized
> > to
> > > > be
> > > > >>>> within the range 0-1023 . I am fairly confident about this
> scheme
> > > > >>>> especially since even during the hotspot we see no evidence so
> far
> > > > that
> > > > >>> any
> > > > >>>> particular region is taking disproportionate traffic (based on
> > > > Cloudera
> > > > >>>> Manager per region charts on the hotspot server). Does that look
> > > like
> > > > a
> > > > >>>> reasonable scheme to randomize which region any give key goes
> to?
> > > And
> > > > >>> the
> > > > >>>> start of the hotspot doesn't seem to correspond to any region
> > > > splitting
> > > > >>> or
> > > > >>>> moving from one server to another activity.
> > > > >>>>
> > > > >>>> Thanks.
> > > > >>>>
> > > > >>>> ----
> > > > >>>> Saad
> > > > >>>>
> > > > >>>>
> > > > >>>>> On Thu, Dec 1, 2016 at 3:32 PM, John Leach <
> > > jle...@splicemachine.com
> > > > >
> > > > >>> wrote:
> > > > >>>>>
> > > > >>>>> Saad,
> > > > >>>>>
> > > > >>>>> Region move or split causes client connections to
> simultaneously
> > > > >>> refresh
> > > > >>>>> their meta.
> > > > >>>>>
> > > > >>>>> Key word is supposed.  We have seen meta hot spotting from time
> > to
> > > > time
> > > > >>>>> and on different versions at Splice Machine.
> > > > >>>>>
> > > > >>>>> How confident are you in your hashing algorithm?
> > > > >>>>>
> > > > >>>>> Regards,
> > > > >>>>> John Leach
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>> On Dec 1, 2016, at 2:25 PM, Saad Mufti <saad.mu...@gmail.com>
> > > > wrote:
> > > > >>>>>>
> > > > >>>>>> No never thought about that. I just figured out how to locate
> > the
> > > > >>> server
> > > > >>>>>> for that table after you mentioned it. We'll have to keep an
> eye
> > > on
> > > > it
> > > > >>>>> next
> > > > >>>>>> time we have a hotspot to see if it coincides with the hotspot
> > > > server.
> > > > >>>>>>
> > > > >>>>>> What would be the theory for how it could become a hotspot?
> > Isn't
> > > > the
> > > > >>>>>> client supposed to cache it and only go back for a refresh if
> it
> > > > hits
> > > > >>> a
> > > > >>>>>> region that is not in its expected location?
> > > > >>>>>>
> > > > >>>>>> ----
> > > > >>>>>> Saad
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Thu, Dec 1, 2016 at 2:56 PM, John Leach <
> > > > jle...@splicemachine.com>
> > > > >>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Saad,
> > > > >>>>>>>
> > > > >>>>>>> Did you validate that Meta is not on the “Hot” region server?
> > > > >>>>>>>
> > > > >>>>>>> Regards,
> > > > >>>>>>> John Leach
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>> On Dec 1, 2016, at 1:50 PM, Saad Mufti <
> saad.mu...@gmail.com>
> > > > >>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>> Hi,
> > > > >>>>>>>>
> > > > >>>>>>>> We are using HBase 1.0 on CDH 5.5.2 . We have taken great
> care
> > > to
> > > > >>> avoid
> > > > >>>>>>>> hotspotting due to inadvertent data patterns by prepending
> an
> > > MD5
> > > > >>>>> based 4
> > > > >>>>>>>> digit hash prefix to all our data keys. This works fine most
> > of
> > > > the
> > > > >>>>>>> times,
> > > > >>>>>>>> but more and more (as much as once or twice a day) recently
> we
> > > > have
> > > > >>>>>>>> occasions where one region server suddenly becomes "hot"
> (CPU
> > > > above
> > > > >>> or
> > > > >>>>>>>> around 95% in various monitoring tools). When it happens it
> > > lasts
> > > > >>> for
> > > > >>>>>>>> hours, occasionally the hotspot might jump to another region
> > > > server
> > > > >>> as
> > > > >>>>>>> the
> > > > >>>>>>>> master decide the region is unresponsive and gives its
> region
> > to
> > > > >>>>> another
> > > > >>>>>>>> server.
> > > > >>>>>>>>
> > > > >>>>>>>> For the longest time, we thought this must be some single
> > rogue
> > > > key
> > > > >>> in
> > > > >>>>>>> our
> > > > >>>>>>>> input data that is being hammered. All attempts to track
> this
> > > down
> > > > >>> have
> > > > >>>>>>>> failed though, and the following behavior argues against
> this
> > > > being
> > > > >>>>>>>> application based:
> > > > >>>>>>>>
> > > > >>>>>>>> 1. plotted Get and Put rate by region on the "hot" region
> > server
> > > > in
> > > > >>>>>>>> Cloudera Manager Charts, shows no single region is an
> outlier.
> > > > >>>>>>>>
> > > > >>>>>>>> 2. cleanly restarting just the region server process causes
> > its
> > > > >>> regions
> > > > >>>>>>> to
> > > > >>>>>>>> randomly migrate to other region servers, then it gets new
> > ones
> > > > from
> > > > >>>>> the
> > > > >>>>>>>> HBase master, basically a sort of shuffling, then the
> hotspot
> > > goes
> > > > >>>>> away.
> > > > >>>>>>> If
> > > > >>>>>>>> it were application based, you'd expect the hotspot to just
> > jump
> > > > to
> > > > >>>>>>> another
> > > > >>>>>>>> region server.
> > > > >>>>>>>>
> > > > >>>>>>>> 3. have pored through region server logs and can't see
> > anything
> > > > out
> > > > >>> of
> > > > >>>>>>> the
> > > > >>>>>>>> ordinary happening
> > > > >>>>>>>>
> > > > >>>>>>>> The only other pertinent thing to mention might be that we
> > have
> > > a
> > > > >>>>> special
> > > > >>>>>>>> process of our own running outside the cluster that does
> > cluster
> > > > >>> wide
> > > > >>>>>>> major
> > > > >>>>>>>> compaction in a rolling fashion, where each batch consists
> of
> > > one
> > > > >>>>> region
> > > > >>>>>>>> from each region server, and it waits before one batch is
> > > > completely
> > > > >>>>> done
> > > > >>>>>>>> before starting another. We have seen no real impact on the
> > > > hotspot
> > > > >>>>> from
> > > > >>>>>>>> shutting this down and in normal times it doesn't impact our
> > > read
> > > > or
> > > > >>>>>>> write
> > > > >>>>>>>> performance much.
> > > > >>>>>>>>
> > > > >>>>>>>> We are at our wit's end, anyone have experience with a
> > scenario
> > > > like
> > > > >>>>>>> this?
> > > > >>>>>>>> Any help/guidance would be most appreciated.
> > > > >>>>>>>>
> > > > >>>>>>>> -----
> > > > >>>>>>>> Saad
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> > >
> > >
> >
>

Re: Hot Region Server With No Hot Region

Reply via email to