Here is what I see...
* Short Compaction Running on Heap "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547" - Thread t@242 java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.compressSingleKeyValue(FastDiffDeltaEncoder.java:270) at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.internalEncode(FastDiffDeltaEncoder.java:245) at org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder.encode(BufferedDataBlockEncoder.java:987) at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.encode(FastDiffDeltaEncoder.java:58) at org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.encode(HFileDataBlockEncoderImpl.java:97) at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(HFileBlock.java:866) at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:270) at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:949) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:282) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:105) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:124) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1233) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1770) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:520) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) * WAL Syncs waiting… ALL 5 "sync.0" - Thread t@202 java.lang.Thread.State: TIMED_WAITING at java.lang.Object.wait(Native Method) - waiting on <67ba892d> (a java.util.LinkedList) at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2337) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2224) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:2116) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1379) at java.lang.Thread.run(Thread.java:745) * Mutations backing up very badly... "B.defaultRpcServer.handler=103,queue=7,port=60020" - Thread t@155 java.lang.Thread.State: TIMED_WAITING at java.lang.Object.wait(Native Method) - waiting on <6ab54ea3> (a org.apache.hadoop.hbase.regionserver.wal.SyncFuture) at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:167) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.blockOnSync(FSHLog.java:1504) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncThenBlockOnCompletion(FSHLog.java:1498) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1632) at org.apache.hadoop.hbase.regionserver.HRegion.syncOrDefer(HRegion.java:7737) at org.apache.hadoop.hbase.regionserver.HRegion.processRowsWithLocks(HRegion.java:6504) at org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:6352) at org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:6334) at org.apache.hadoop.hbase.regionserver.HRegion.mutateRow(HRegion.java:6325) at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutateRows(RSRpcServices.java:418) at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1916) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Too many writers being blocked attempting to write to WAL. What does your disk infrastructure look like? Can you get away with Multi-wal? Ugh... Regards, John Leach > On Dec 2, 2016, at 1:20 PM, Saad Mufti <saad.mu...@gmail.com> wrote: > > Hi Ted, > > Finally we have another hotspot going on, same symptoms as before, here is > the pastebin for the stack trace from the region server that I obtained via > VisualVM: > > http://pastebin.com/qbXPPrXk > > Would really appreciate any insight you or anyone else can provide. > > Thanks. > > ---- > Saad > > > On Thu, Dec 1, 2016 at 6:08 PM, Saad Mufti <saad.mu...@gmail.com> wrote: > >> Sure will, the next time it happens. >> >> Thanks!!! >> >> ---- >> Saad >> >> >> On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu <ted...@yahoo.com.invalid> wrote: >> >>> From #2 in the initial email, the hbase:meta might not be the cause for >>> the hotspot. >>> >>> Saad: >>> Can you pastebin stack trace of the hot region server when this happens >>> again ? >>> >>> Thanks >>> >>>> On Dec 2, 2016, at 4:48 AM, Saad Mufti <saad.mu...@gmail.com> wrote: >>>> >>>> We used a pre-split into 1024 regions at the start but we miscalculated >>> our >>>> data size, so there were still auto-splits storms at the beginning as >>> data >>>> size stabilized, it has ended up at around 9500 or so regions, plus a >>> few >>>> thousand regions for a few other tables (much smaller). But haven't had >>> any >>>> new auto-splits in a couple of months. And the hotspots only started >>>> happening recently. >>>> >>>> Our hashing scheme is very simple, we take the MD5 of the key, then >>> form a >>>> 4 digit prefix based on the first two bytes of the MD5 normalized to be >>>> within the range 0-1023 . I am fairly confident about this scheme >>>> especially since even during the hotspot we see no evidence so far that >>> any >>>> particular region is taking disproportionate traffic (based on Cloudera >>>> Manager per region charts on the hotspot server). Does that look like a >>>> reasonable scheme to randomize which region any give key goes to? And >>> the >>>> start of the hotspot doesn't seem to correspond to any region splitting >>> or >>>> moving from one server to another activity. >>>> >>>> Thanks. >>>> >>>> ---- >>>> Saad >>>> >>>> >>>>> On Thu, Dec 1, 2016 at 3:32 PM, John Leach <jle...@splicemachine.com> >>> wrote: >>>>> >>>>> Saad, >>>>> >>>>> Region move or split causes client connections to simultaneously >>> refresh >>>>> their meta. >>>>> >>>>> Key word is supposed. We have seen meta hot spotting from time to time >>>>> and on different versions at Splice Machine. >>>>> >>>>> How confident are you in your hashing algorithm? >>>>> >>>>> Regards, >>>>> John Leach >>>>> >>>>> >>>>> >>>>>> On Dec 1, 2016, at 2:25 PM, Saad Mufti <saad.mu...@gmail.com> wrote: >>>>>> >>>>>> No never thought about that. I just figured out how to locate the >>> server >>>>>> for that table after you mentioned it. We'll have to keep an eye on it >>>>> next >>>>>> time we have a hotspot to see if it coincides with the hotspot server. >>>>>> >>>>>> What would be the theory for how it could become a hotspot? Isn't the >>>>>> client supposed to cache it and only go back for a refresh if it hits >>> a >>>>>> region that is not in its expected location? >>>>>> >>>>>> ---- >>>>>> Saad >>>>>> >>>>>> >>>>>> On Thu, Dec 1, 2016 at 2:56 PM, John Leach <jle...@splicemachine.com> >>>>> wrote: >>>>>> >>>>>>> Saad, >>>>>>> >>>>>>> Did you validate that Meta is not on the “Hot” region server? >>>>>>> >>>>>>> Regards, >>>>>>> John Leach >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Dec 1, 2016, at 1:50 PM, Saad Mufti <saad.mu...@gmail.com> >>> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We are using HBase 1.0 on CDH 5.5.2 . We have taken great care to >>> avoid >>>>>>>> hotspotting due to inadvertent data patterns by prepending an MD5 >>>>> based 4 >>>>>>>> digit hash prefix to all our data keys. This works fine most of the >>>>>>> times, >>>>>>>> but more and more (as much as once or twice a day) recently we have >>>>>>>> occasions where one region server suddenly becomes "hot" (CPU above >>> or >>>>>>>> around 95% in various monitoring tools). When it happens it lasts >>> for >>>>>>>> hours, occasionally the hotspot might jump to another region server >>> as >>>>>>> the >>>>>>>> master decide the region is unresponsive and gives its region to >>>>> another >>>>>>>> server. >>>>>>>> >>>>>>>> For the longest time, we thought this must be some single rogue key >>> in >>>>>>> our >>>>>>>> input data that is being hammered. All attempts to track this down >>> have >>>>>>>> failed though, and the following behavior argues against this being >>>>>>>> application based: >>>>>>>> >>>>>>>> 1. plotted Get and Put rate by region on the "hot" region server in >>>>>>>> Cloudera Manager Charts, shows no single region is an outlier. >>>>>>>> >>>>>>>> 2. cleanly restarting just the region server process causes its >>> regions >>>>>>> to >>>>>>>> randomly migrate to other region servers, then it gets new ones from >>>>> the >>>>>>>> HBase master, basically a sort of shuffling, then the hotspot goes >>>>> away. >>>>>>> If >>>>>>>> it were application based, you'd expect the hotspot to just jump to >>>>>>> another >>>>>>>> region server. >>>>>>>> >>>>>>>> 3. have pored through region server logs and can't see anything out >>> of >>>>>>> the >>>>>>>> ordinary happening >>>>>>>> >>>>>>>> The only other pertinent thing to mention might be that we have a >>>>> special >>>>>>>> process of our own running outside the cluster that does cluster >>> wide >>>>>>> major >>>>>>>> compaction in a rolling fashion, where each batch consists of one >>>>> region >>>>>>>> from each region server, and it waits before one batch is completely >>>>> done >>>>>>>> before starting another. We have seen no real impact on the hotspot >>>>> from >>>>>>>> shutting this down and in normal times it doesn't impact our read or >>>>>>> write >>>>>>>> performance much. >>>>>>>> >>>>>>>> We are at our wit's end, anyone have experience with a scenario like >>>>>>> this? >>>>>>>> Any help/guidance would be most appreciated. >>>>>>>> >>>>>>>> ----- >>>>>>>> Saad >>>>>>> >>>>>>> >>>>> >>>>> >>> >> >>