Re: HBase stucks from time to time

Serega Sheypak Wed, 22 Apr 2015 11:26:46 -0700

> major compaction runs daily.

>What else do you see in the RS logs?
no error, only *Slow sync cost *


>How iostat looks like?
please see image. 12.00 - 12.30 is a time when reading/writing stopped
[image: Встроенное изображение 1]


>Can you share the logs around the time this occurs?
2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer:
(responseTooSlow):
{"processingtimems":20128,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":"
5.9.75.155:58344
","starttimems":1429692769868,"queuetimems":28034,"class":"HRegionServer","responsesize":8,"method":"Multi"}
2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer:
(responseTooSlow):
{"processingtimems":21434,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"
46.4.0.110:60149
","starttimems":1429692768562,"queuetimems":44263,"class":"HRegionServer","responsesize":2,"method":"Mutate"}
2015-04-22 12:53:10,093 WARN org.apache.hadoop.ipc.RpcServer:
(responseTooSlow):
{"processingtimems":17997,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"46.4.0.

015-04-22 12:53:10,270 WARN org.apache.hadoop.ipc.RpcServer:
(responseTooSlow):
{"processingtimems":18175,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"
144.76.218.107:48620
","starttimems":1429692772095,"queuetimems":49253,"class":"HRegionServer","responsesize":2,"method":"Mutate"}
2015-04-22 12:53:10,315 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:10,315 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:10,315 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:10,315 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:10,315 WARN org.apache.hadoop.ipc.RpcServer:
(responseTooSlow):
{"processingtimems":20231,"call":"Mutate(org.apache.hadoop.hbase.

2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HStore:
Added
hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/c/a93405e51088418ba95241cf53456011,
entries=926, sequenceid=43403085, filesize=258.5 K
2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Finished memstore flush of ~3.8 M/3985504, currentsize=12.6 K/12912 for
region
cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a.
in 857ms, sequenceid=43403085, compaction requested=true
2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting compaction on c in region
cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a.
2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HStore:
Starting compaction of 3 file(s) in c of
cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a.
into
tmpdir=hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/.tmp,
totalSize=8.0 M
2015-04-22 12:53:11,917 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 417 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:11,939 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:11,939 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:11,939 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:11,939 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:12,199 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 282 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:13,132 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:13,132 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:13,132 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:13,132 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
2015-04-22 12:53:13,247 INFO
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed,
sequenceid=41975630, memsize=3.6 M, hasBloomFilter=true, into tmp file
hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/.tmp/3495a2b487fa448bb9e0b50050b11c3a
2015-04-22 12:53:13,301 INFO org.apache.hadoop.hbase.regionserver.HStore:
Added
hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/c/3495a2b487fa448bb9e0b50050b11c3a,
entries=886, sequenceid=41975630, filesize=272.9 K

2015-04-22 19:54 GMT+02:00 Ted Yu <yuzhih...@gmail.com>:

> Serega:
> How often is major compaction run in your cluster ?
>
> Have you configured offpeak compaction ?
> See related parameters in:
> http://hbase.apache.org/book.html#compaction.parameters
>
> Cheers
>
> On Wed, Apr 22, 2015 at 10:39 AM, Serega Sheypak <serega.shey...@gmail.com
> >
> wrote:
>
> > Hi, we have 10 nodes cluster running HBase 0.98 CDH 5.2.1
> > Sometimes HBase stucks.
> > We have several apps constantly writing/reading data from it. Sometimes
> we
> > see that apps response time dramatically increases. It means that app
> > spends seconds to read/write from/to HBase. in 99% of time it takes 20ms.
> >
> > I suppose that compactions/major compactions could be the root cause. I
> see
> > that compactions start at the same time when we have problems with app.
> > Could it be so?
> > So HBase can't write to WAL because compactions consumes all IO and apps
> > stops to write data?
> >
>

Re: HBase stucks from time to time

Reply via email to