Here is disk stats. Sadness appeared ad 12.30 - 13.00 https://www.dropbox.com/s/lj4r8o10buv1n2o/Screenshot%202015-04-22%2020.48.18.png?dl=0
2015-04-22 20:41 GMT+02:00 Serega Sheypak <serega.shey...@gmail.com>: > Here is an image > > 2015-04-22 20:40 GMT+02:00 Serega Sheypak <serega.shey...@gmail.com>: > >> Here are datanode logs from 5.9.41.237 <http://5.9.41.237:50010/>, >> regionserver logs were from5.9.41.237 <http://5.9.41.237:50010/> also >> >> EQUEST_SHORT_CIRCUIT_FDS, blockid: 1078130838, srvID: >> 659e6be2-8d98-458b-94bc-3bcdbb517508, success: true >> 2015-04-22 12:46:17,154 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> 127.0.0.1, dest: 127.0.0.1, op: RELEASE_SHORT_CIRCUIT_FDS, shmId: >> bd37825f7a445e2da6796940ebb754d6, slotIdx: 24, srvID: >> 659e6be2-8d98-458b-94bc-3bcdbb517508, success: true >> 2015-04-22 12:46:17,204 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> 127.0.0.1, dest: 127.0.0.1, op: RELEASE_SHORT_CIRCUIT_FDS, shmId: >> aea53fb897c383f3dec304ed618db0df, slotIdx: 4, srvID: >> 659e6be2-8d98-458b-94bc-3bcdbb517508, success: true >> 2015-04-22 12:46:17,219 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> 127.0.0.1, dest: 127.0.0.1, op: RELEASE_SHORT_CIRCUIT_FDS, shmId: >> 0178cc0245fcdc9e1dd75c5f8c6da1eb, slotIdx: 0, srvID: >> 659e6be2-8d98-458b-94bc-3bcdbb517508, success: true >> 2015-04-22 12:46:17,236 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> 127.0.0.1, dest: 127.0.0.1, op: RELEASE_SHORT_CIRCUIT_FDS, shmId: >> aea53fb897c383f3dec304ed618db0df, slotIdx: 102, srvID: >> 659e6be2-8d98-458b-94bc-3bcdbb517508, success: true >> 2015-04-22 12:46:17,573 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 5.9.41.237:50010, dest: /5.9.65.143:36281, bytes: 4608, op: HDFS_READ, >> cliID: DFSClient_hb_rs_domain06.myhost.ru,60020,1426776843636_-1084357126_33, >> offset: 28435456, srvID: 659e6be2-8d98-458b-94bc-3bcdbb517508, blockid: >> BP-927943268-5.9.77.105-1414682145673:blk_1078147715_4407483, duration: >> 12396486 >> 2015-04-22 12:46:17,596 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 5.9.41.237:50010, dest: /78.46.48.37:41539, bytes: 4608, op: HDFS_READ, >> cliID: DFSClient_hb_rs_domain13.myhost.ru,60020,1429640630559_-531755738_33, >> offset: 56052736, srvID: 659e6be2-8d98-458b-94bc-3bcdbb517508, blockid: >> BP-927943268-5.9.77.105-1414682145673:blk_1078106901_4366668, duration: >> 37455427 >> 2015-04-22 12:46:17,821 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 5.9.41.237:50010, dest: /5.9.67.100:58718, bytes: 5120, op: HDFS_READ, >> cliID: DFSClient_hb_rs_domain03.myhost.ru,60020,1426877064271_-1246826933_33, >> offset: 77630464, srvID: 659e6be2-8d98-458b-94bc-3bcdbb517508, blockid: >> BP-927943268-5.9.77.105-1414682145673:blk_1078119757_4379524, duration: >> 16386940 >> 2015-04-22 12:46:18,769 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to send data: >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/78.46.48.37:40503] >> 2015-04-22 12:46:18,769 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 5.9.41.237:50010, dest: /78.46.48.37:40503, bytes: 393216, op: >> HDFS_READ, cliID: >> DFSClient_hb_rs_domain13.myhost.ru,60020,1429640630559_-531755738_33, >> offset: 3584, srvID: 659e6be2-8d98-458b-94bc-3bcdbb517508, blockid: >> BP-927943268-5.9.77.105-1414682145673:blk_1078155386_4415154, duration: >> 480345113893 >> 2015-04-22 12:46:18,769 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(5.9.41.237, >> datanodeUuid=659e6be2-8d98-458b-94bc-3bcdbb517508, infoPort=50075, >> ipcPort=50020, storageInfo=lv=-56;cid=cluster11;nsid=527111981;c=0):Got >> exception while serving >> BP-927943268-5.9.77.105-1414682145673:blk_1078155386_4415154 to / >> 78.46.48.37:40503 >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/78.46.48.37:40503] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) >> at java.lang.Thread.run(Thread.java:745) >> 2015-04-22 12:46:18,770 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> domain02.myhost.ru:50010:DataXceiver >> error processing READ_BLOCK operation src: /78.46.48.37:40503 dst: / >> 5.9.41.237:50010 >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/78.46.48.37:40503] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) >> at java.lang.Thread.run(Thread.java:745) >> 2015-04-22 12:46:18,897 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to send data: >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/5.9.74.13:34323] >> 2015-04-22 12:46:18,897 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 5.9.41.237:50010, dest: /5.9.74.13:34323, bytes: 393216, op: HDFS_READ, >> cliID: DFSClient_hb_rs_domain09.myhost.ru,60020,1426772397450_1095315710_33, >> offset: 3584, srvID: 659e6be2-8d98-458b-94bc-3bcdbb517508, blockid: >> BP-927943268-5.9.77.105-1414682145673:blk_1078159305_4419073, duration: >> 480265217202 >> 2015-04-22 12:46:18,897 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(5.9.41.237, >> datanodeUuid=659e6be2-8d98-458b-94bc-3bcdbb517508, infoPort=50075, >> ipcPort=50020, storageInfo=lv=-56;cid=cluster11;nsid=527111981;c=0):Got >> exception while serving >> BP-927943268-5.9.77.105-1414682145673:blk_1078159305_4419073 to / >> 5.9.74.13:34323 >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/5.9.74.13:34323] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) >> at java.lang.Thread.run(Thread.java:745) >> 2015-04-22 12:46:18,898 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> domain02.myhost.ru:50010:DataXceiver >> error processing READ_BLOCK operation src: /5.9.74.13:34323 dst: / >> 5.9.41.237:50010 >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/5.9.74.13:34323] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) >> at java.lang.Thread.run(Thread.java:745) >> 2015-04-22 12:46:18,933 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to send data: >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/5.9.37.17:48803] >> 2015-04-22 12:46:18,933 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 5.9.41.237:50010, dest: /5.9.37.17:48803, bytes: 393216, op: HDFS_READ, >> cliID: DFSClient_hb_rs_domain07.myhost.ru,60020,1426776403757_1987133929_33, >> offset: 3584, srvID: 659e6be2-8d98-458b-94bc-3bcdbb517508, blockid: >> BP-927943268-5.9.77.105-1414682145673:blk_1078158534_4418302, duration: >> 480375396714 >> 2015-04-22 12:46:18,933 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(5.9.41.237, >> datanodeUuid=659e6be2-8d98-458b-94bc-3bcdbb517508, infoPort=50075, >> ipcPort=50020, storageInfo=lv=-56;cid=cluster11;nsid=527111981;c=0):Got >> exception while serving >> BP-927943268-5.9.77.105-1414682145673:blk_1078158534_4418302 to / >> 5.9.37.17:48803 >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/5.9.37.17:48803] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) >> at java.lang.Thread.run(Thread.java:745) >> 2015-04-22 12:46:18,934 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> domain02.myhost.ru:50010:DataXceiver >> error processing READ_BLOCK operation src: /5.9.37.17:48803 dst: / >> 5.9.41.237:50010 >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/5.9.37.17:48803] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) >> at java.lang.Thread.run(Thread.java:745) >> 2015-04-22 12:46:20,478 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to send data: >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/5.9.77.105:36871] >> 2015-04-22 12:46:20,478 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 5.9.41.237:50010, dest: /5.9.77.105:36871, bytes: 327680, op: HDFS_READ, >> cliID: DFSClient_hb_rs_domain01.myhost.ru,60020,1427209265829_704706779_33, >> offset: 3584, srvID: 659e6be2-8d98-458b-94bc-3bcdbb517508, blockid: >> BP-927943268-5.9.77.105-1414682145673:blk_1078159312_4419080, duration: >> 480489227857 >> 2015-04-22 12:46:20,478 WARN >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> DatanodeRegistration(5.9.41.237, >> datanodeUuid=659e6be2-8d98-458b-94bc-3bcdbb517508, infoPort=50075, >> ipcPort=50020, storageInfo=lv=-56;cid=cluster11;nsid=527111981;c=0):Got >> exception while serving >> BP-927943268-5.9.77.105-1414682145673:blk_1078159312_4419080 to / >> 5.9.77.105:36871 >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/5.9.77.105:36871] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) >> at java.lang.Thread.run(Thread.java:745) >> 2015-04-22 12:46:20,478 ERROR >> org.apache.hadoop.hdfs.server.datanode.DataNode: >> domain02.myhost.ru:50010:DataXceiver >> error processing READ_BLOCK operation src: /5.9.77.105:36871 dst: / >> 5.9.41.237:50010 >> java.net.SocketTimeoutException: 480000 millis timeout while waiting for >> channel to be ready for write. ch : >> java.nio.channels.SocketChannel[connected local=/5.9.41.237:50010 >> remote=/5.9.77.105:36871] >> at >> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) >> at >> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172) >> at >> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547) >> at >> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111) >> at >> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69) >> at >> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) >> at java.lang.Thread.run(Thread.java:745) >> 2015-04-22 12:46:20,677 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 5.9.41.237:50010, dest: /5.9.65.143:36286, bytes: 5120, op: HDFS_READ, >> cliID: DFSClient_hb_rs_domain06.myhost.ru,60020,1426776843636_-1084357126_33, >> offset: 86653440, srvID: 659e6be2-8d98-458b-94bc-3bcdbb517508, blockid: >> BP-927943268-5.9.77.105-1414682145673:blk_1078128115_4387882, duration: >> 17623602 >> 2015-04-22 12:46:20,690 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 5.9.41.237:50010, dest: /78.46.48.38:57022, bytes: 5120, op: HDFS_READ, >> cliID: DFSClient_hb_rs_domain14.myhost.ru,60020,1429640630771_156324173_33, >> offset: 103135232, srvID: 659e6be2-8d98-458b-94bc-3bcdbb517508, blockid: >> BP-927943268-5.9.77.105-1414682145673:blk_1078120513_4380280, duration: >> 24431169 >> 2015-04-22 12:46:20,747 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: / >> 5.9.41.237:50010, dest: /5.9.40.136:34501, bytes: 5120, op: HDFS_READ, >> cliID: DFSClient_hb_rs_domain04.myhost.ru,60020,1426774142736_-1119788258_33, >> offset: 78469632, srvID: 659e6be2-8d98-458b-94bc-3bcdbb517508, blockid: >> BP-927943268-5.9.77.105-1414682145673:blk_1078108223_4367990, duration: >> 76055489 >> 2015-04-22 12:46:20,758 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving >> BP-927943268-5.9.77.105-1414682145673:blk_1078160469_4420237 src: / >> 5.9.74.13:35440 dest: /5.9.41.237:50010 >> >> >> 2015-04-22 20:30 GMT+02:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>: >> >>> Did you send the logs from one of those 3 servers? [5.9.41.237:50010, >>> 5.9.77.105:50010, 5.9.73.19:50010] >>> >>> Sound like something is slowing done everything. Can you extract DN logs >>> for the same time? >>> >>> Do you have any tool monitoring the disks and network latency over time? >>> >>> If not, can you run iostat and try to reproduce the issue? >>> >>> JM >>> >>> 2015-04-22 14:23 GMT-04:00 Serega Sheypak <serega.shey...@gmail.com>: >>> >>> > > major compaction runs daily. >>> > >>> > >What else do you see in the RS logs? >>> > no error, only *Slow sync cost * >>> > >>> > >How iostat looks like? >>> > please see image. 12.00 - 12.30 is a time when reading/writing stopped >>> > [image: Встроенное изображение 1] >>> > >>> > >>> > >Can you share the logs around the time this occurs? >>> > 2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer: >>> > (responseTooSlow): >>> > >>> {"processingtimems":20128,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiRequest)","client":" >>> > 5.9.75.155:58344 >>> > >>> ","starttimems":1429692769868,"queuetimems":28034,"class":"HRegionServer","responsesize":8,"method":"Multi"} >>> > 2015-04-22 12:53:09,996 WARN org.apache.hadoop.ipc.RpcServer: >>> > (responseTooSlow): >>> > >>> {"processingtimems":21434,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":" >>> > 46.4.0.110:60149 >>> > >>> ","starttimems":1429692768562,"queuetimems":44263,"class":"HRegionServer","responsesize":2,"method":"Mutate"} >>> > 2015-04-22 12:53:10,093 WARN org.apache.hadoop.ipc.RpcServer: >>> > (responseTooSlow): >>> > >>> {"processingtimems":17997,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":"46.4.0. >>> > >>> > 015-04-22 12:53:10,270 WARN org.apache.hadoop.ipc.RpcServer: >>> > (responseTooSlow): >>> > >>> {"processingtimems":18175,"call":"Mutate(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest)","client":" >>> > 144.76.218.107:48620 >>> > >>> ","starttimems":1429692772095,"queuetimems":49253,"class":"HRegionServer","responsesize":2,"method":"Mutate"} >>> > 2015-04-22 12:53:10,315 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:10,315 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:10,315 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:10,315 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 319 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:10,315 WARN org.apache.hadoop.ipc.RpcServer: >>> > (responseTooSlow): >>> > {"processingtimems":20231,"call":"Mutate(org.apache.hadoop.hbase. >>> > >>> > 2015-04-22 12:53:11,726 INFO >>> org.apache.hadoop.hbase.regionserver.HStore: >>> > Added >>> > >>> hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/c/a93405e51088418ba95241cf53456011, >>> > entries=926, sequenceid=43403085, filesize=258.5 K >>> > 2015-04-22 12:53:11,726 INFO >>> org.apache.hadoop.hbase.regionserver.HRegion: >>> > Finished memstore flush of ~3.8 M/3985504, currentsize=12.6 K/12912 for >>> > region >>> > >>> cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a. >>> > in 857ms, sequenceid=43403085, compaction requested=true >>> > 2015-04-22 12:53:11,726 INFO >>> org.apache.hadoop.hbase.regionserver.HRegion: >>> > Starting compaction on c in region >>> > >>> cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a. >>> > 2015-04-22 12:53:11,726 INFO >>> org.apache.hadoop.hbase.regionserver.HStore: >>> > Starting compaction of 3 file(s) in c of >>> > >>> cross_id_to_rtb_user,aadf0929,1429256709528.d2e01ed1de47dd92edb5c1de37277d4a. >>> > into >>> > >>> tmpdir=hdfs://nameservice1/hbase/data/default/cross_id_to_rtb_user/d2e01ed1de47dd92edb5c1de37277d4a/.tmp, >>> > totalSize=8.0 M >>> > 2015-04-22 12:53:11,917 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 417 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:11,939 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:11,939 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:11,939 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:11,939 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 340 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:12,199 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 282 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:13,132 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:13,132 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:13,132 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:13,132 INFO >>> > org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 1193 >>> ms, >>> > current pipeline: [5.9.41.237:50010, 5.9.77.105:50010, 5.9.73.19:50010 >>> ] >>> > 2015-04-22 12:53:13,247 INFO >>> > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher: Flushed, >>> > sequenceid=41975630, memsize=3.6 M, hasBloomFilter=true, into tmp file >>> > >>> hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/.tmp/3495a2b487fa448bb9e0b50050b11c3a >>> > 2015-04-22 12:53:13,301 INFO >>> org.apache.hadoop.hbase.regionserver.HStore: >>> > Added >>> > >>> hdfs://nameservice1/hbase/data/default/rtb_user_to_cross_id/c54d51dd7f20377ea34f05824abbaf76/c/3495a2b487fa448bb9e0b50050b11c3a, >>> > entries=886, sequenceid=41975630, filesize=272.9 K >>> > >>> > 2015-04-22 19:54 GMT+02:00 Ted Yu <yuzhih...@gmail.com>: >>> > >>> >> Serega: >>> >> How often is major compaction run in your cluster ? >>> >> >>> >> Have you configured offpeak compaction ? >>> >> See related parameters in: >>> >> http://hbase.apache.org/book.html#compaction.parameters >>> >> >>> >> Cheers >>> >> >>> >> On Wed, Apr 22, 2015 at 10:39 AM, Serega Sheypak < >>> >> serega.shey...@gmail.com> >>> >> wrote: >>> >> >>> >> > Hi, we have 10 nodes cluster running HBase 0.98 CDH 5.2.1 >>> >> > Sometimes HBase stucks. >>> >> > We have several apps constantly writing/reading data from it. >>> Sometimes >>> >> we >>> >> > see that apps response time dramatically increases. It means that >>> app >>> >> > spends seconds to read/write from/to HBase. in 99% of time it takes >>> >> 20ms. >>> >> > >>> >> > I suppose that compactions/major compactions could be the root >>> cause. I >>> >> see >>> >> > that compactions start at the same time when we have problems with >>> app. >>> >> > Could it be so? >>> >> > So HBase can't write to WAL because compactions consumes all IO and >>> apps >>> >> > stops to write data? >>> >> > >>> >> >>> > >>> > >>> >> >> >