Did you send the logs from one of those 3 servers? [5.9.41.237:50010,
5.9.77.105:50010, 5.9.73.19:50010]

Sound like something is slowing done everything. Can you extract DN logs
for the same time?

Do you have any tool monitoring the disks and network latency over time?

If not, can you run iostat and try to reproduce the issue?

JM

2015-04-22 14:23 GMT-04:00 Serega Sheypak <serega.shey...@gmail.com>:

> > major compaction runs daily.
>
> >What else do you see in the RS logs?
> no error, only *Slow sync cost *
>
> >How iostat looks like?
> please see image. 12.00 - 12.30 is a time when reading/writing stopped
> [image: Встроенное изображение 1]
>
>
> >Can you share the logs around the time this occurs?
> 2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":20128,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":"
> 5.9.75.155:58344
> ","starttimems":1429692769868,"queuetimems":28034,"class":"HRegionServer","responsesize":8,"method":"Multi"}
> 2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":21434,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"
> 46.4.0.110:60149
> ","starttimems":1429692768562,"queuetimems":44263,"class":"HRegionServer","responsesize":2,"method":"Mutate"}
> 2015-04-22 12:53:10,093 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":17997,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"46.4.0.
>
> 015-04-22 12:53:10,270 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":18175,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"
> 144.76.218.107:48620
> ","starttimems":1429692772095,"queuetimems":49253,"class":"HRegionServer","responsesize":2,"method":"Mutate"}
> 2015-04-22 12:53:10,315 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:10,315 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:10,315 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:10,315 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:10,315 WARN org.apache.hadoop.ipc.RpcServer:
> (responseTooSlow):
> {"processingtimems":20231,"call":"Mutate(org.apache.hadoop.hbase.
>
> 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Added
> hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/c/a93405e51088418ba95241cf53456011,
> entries=926, sequenceid=43403085, filesize=258.5 K
> 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Finished memstore flush of ~3.8 M/3985504, currentsize=12.6 K/12912 for
> region
> cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a.
> in 857ms, sequenceid=43403085, compaction requested=true
> 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on c in region
> cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a.
> 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Starting compaction of 3 file(s) in c of
> cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a.
> into
> tmpdir=hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/.tmp,
> totalSize=8.0 M
> 2015-04-22 12:53:11,917 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 417 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:11,939 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:11,939 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:11,939 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:11,939 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:12,199 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 282 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:13,132 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:13,132 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:13,132 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:13,132 INFO
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms,
> current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010]
> 2015-04-22 12:53:13,247 INFO
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed,
> sequenceid=41975630, memsize=3.6 M, hasBloomFilter=true, into tmp file
> hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/.tmp/3495a2b487fa448bb9e0b50050b11c3a
> 2015-04-22 12:53:13,301 INFO org.apache.hadoop.hbase.regionserver.HStore:
> Added
> hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/c/3495a2b487fa448bb9e0b50050b11c3a,
> entries=886, sequenceid=41975630, filesize=272.9 K
>
> 2015-04-22 19:54 GMT+02:00 Ted Yu <yuzhih...@gmail.com>:
>
>> Serega:
>> How often is major compaction run in your cluster ?
>>
>> Have you configured offpeak compaction ?
>> See related parameters in:
>> http://hbase.apache.org/book.html#compaction.parameters
>>
>> Cheers
>>
>> On Wed, Apr 22, 2015 at 10:39 AM, Serega Sheypak <
>> serega.shey...@gmail.com>
>> wrote:
>>
>> > Hi, we have 10 nodes cluster running HBase 0.98 CDH 5.2.1
>> > Sometimes HBase stucks.
>> > We have several apps constantly writing/reading data from it. Sometimes
>> we
>> > see that apps response time dramatically increases. It means that app
>> > spends seconds to read/write from/to HBase. in 99% of time it takes
>> 20ms.
>> >
>> > I suppose that compactions/major compactions could be the root cause. I
>> see
>> > that compactions start at the same time when we have problems with app.
>> > Could it be so?
>> > So HBase can't write to WAL because compactions consumes all IO and apps
>> > stops to write data?
>> >
>>
>
>

Reply via email to