Hi Ted, Stack, thanks for the answers.
We're not using Phoenix but the server process was running on some datanodes. We stopped it to be sure, nothing changed. Ted, I suppose you meant region server jstack traces, here it is: http://pastebin.com/CHzyrBp9 http://pastebin.com/PdNczfVa Stack, our application pulls the messages from RabbitMQ queue and processes them using HBase so there is no dependency on external spikes in load. At least no obvious one. It looks like random requests taking much more time to complete that the others. It happens all the time starting Friday, not during some specific time windows, several percent of requests just slow. Region server GC pauses look normal -- less than 0.1 second every several seconds. The fact that most of "long" requests have execution time in the range of 800-900ms makes me think that we're hitting some timeout which is not logged properly. But this is just an idea. On Mon, Oct 24, 2016 at 7:20 PM, Stack <st...@duboce.net> wrote: > Any correlation with a spike in load -- more i/os, more traffic, other > tasks running on the box -- or a long GC. Is it random requests or over a > particular period all requests suffer the 800ms penalty? > Thanks, > St.Ack > > On Mon, Oct 24, 2016 at 5:44 AM, Alexander Ilyin <alexan...@weborama.com> > wrote: > > > Hi, > > > > I'm experiencing some spikes in HBase response time. The problem appeared > > in one moment: before all the requests used to take 4-30ms to complete > and > > now some of them (several percent) take as long as 800ms decreasing > > significantly an average processing time. > > > > This happens with the requests to all the tables I have so it doesn't > seem > > to be related to some problem in data schema. And it didn't change > anyway. > > > > I don't see any problems with the lack of resources: there is enough CPU, > > memory on datanodes. > > > > Automatic splitting is off, no cluster balancing is taking place so it > > shouldn't be the client retries as well. > > > > There are some stacktraces in datanode log but as far as I understand > these > > are normal and anyway they had existed before the problem appeared: > > 2016-10-24 05:42:19,186 INFO > > [B.defaultRpcServer.handler=17,queue=2,port=16020] > > shortcircuit.ShortCircuitCache: ShortCircuitCache(0x3a130ae2): could not > > load 1075792477_BP-415105157-10.190.240.206-1468423545420 due to > > InvalidToken exception. > > org.apache.hadoop.security.token.SecretManager$InvalidToken: access > > control > > error while attempting to set up short-circuit access to > > /apps/hbase/data/data/profile/DailyUserScores/ > > 84105becaefcf071e43ebc4972d0249f/d/c7653cc98860451cb79e678e9c3f0e85Block > > token with block_token_identifier (expiryDate=1477279957879, > > keyId=2039256459, userId=hbase, > > blockPoolId=BP-415105157-10.190.240.206-1468423545420, > blockId=1075792477, > > access modes=[READ]) is expired. > > > > At the moment I'm a bit stuck so any advices on what to check next are > > welcome. > > > > PS. HBase version is 1.1.2 > > >