> major compaction runs daily. >What else do you see in the RS logs? no error, only *Slow sync cost *
>How iostat looks like? please see image. 12.00 - 12.30 is a time when reading/writing stopped [image: Встроенное изображение 1] >Can you share the logs around the time this occurs? 2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer: (responseTooSlow): {"processingtimems":20128,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":" 5.9.75.155:58344 ","starttimems":1429692769868,"queuetimems":28034,"class":"HRegionServer","responsesize":8,"method":"Multi"} 2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer: (responseTooSlow): {"processingtimems":21434,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":" 46.4.0.110:60149 ","starttimems":1429692768562,"queuetimems":44263,"class":"HRegionServer","responsesize":2,"method":"Mutate"} 2015-04-22 12:53:10,093 WARN org.apache.hadoop.ipc.RpcServer: (responseTooSlow): {"processingtimems":17997,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"46.4.0. 015-04-22 12:53:10,270 WARN org.apache.hadoop.ipc.RpcServer: (responseTooSlow): {"processingtimems":18175,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":" 144.76.218.107:48620 ","starttimems":1429692772095,"queuetimems":49253,"class":"HRegionServer","responsesize":2,"method":"Mutate"} 2015-04-22 12:53:10,315 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:10,315 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:10,315 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:10,315 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:10,315 WARN org.apache.hadoop.ipc.RpcServer: (responseTooSlow): {"processingtimems":20231,"call":"Mutate(org.apache.hadoop.hbase. 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/c/a93405e51088418ba95241cf53456011, entries=926, sequenceid=43403085, filesize=258.5 K 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~3.8 M/3985504, currentsize=12.6 K/12912 for region cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a. in 857ms, sequenceid=43403085, compaction requested=true 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on c in region cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a. 2015-04-22 12:53:11,726 INFO org.apache.hadoop.hbase.regionserver.HStore: Starting compaction of 3 file(s) in c of cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a. into tmpdir=hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/.tmp, totalSize=8.0 M 2015-04-22 12:53:11,917 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 417 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:11,939 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:11,939 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:11,939 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:11,939 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:12,199 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 282 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:13,132 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:13,132 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:13,132 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:13,132 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 ms, current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010] 2015-04-22 12:53:13,247 INFO org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, sequenceid=41975630, memsize=3.6 M, hasBloomFilter=true, into tmp file hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/.tmp/3495a2b487fa448bb9e0b50050b11c3a 2015-04-22 12:53:13,301 INFO org.apache.hadoop.hbase.regionserver.HStore: Added hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/c/3495a2b487fa448bb9e0b50050b11c3a, entries=886, sequenceid=41975630, filesize=272.9 K 2015-04-22 19:54 GMT+02:00 Ted Yu <yuzhih...@gmail.com>: > Serega: > How often is major compaction run in your cluster ? > > Have you configured offpeak compaction ? > See related parameters in: > http://hbase.apache.org/book.html#compaction.parameters > > Cheers > > On Wed, Apr 22, 2015 at 10:39 AM, Serega Sheypak <serega.shey...@gmail.com > > > wrote: > > > Hi, we have 10 nodes cluster running HBase 0.98 CDH 5.2.1 > > Sometimes HBase stucks. > > We have several apps constantly writing/reading data from it. Sometimes > we > > see that apps response time dramatically increases. It means that app > > spends seconds to read/write from/to HBase. in 99% of time it takes 20ms. > > > > I suppose that compactions/major compactions could be the root cause. I > see > > that compactions start at the same time when we have problems with app. > > Could it be so? > > So HBase can't write to WAL because compactions consumes all IO and apps > > stops to write data? > > >