Hi, My setup is as follows: 24 regionservers (7GB RAM, 8-core CPU, 5GB heap space) hbase 0.94.4 5-7 regions per regionserver
I am doing an avg of 4k-5k random gets per regionserver per second and the performance is acceptable in the beginning. I have also done ~10K gets for a single regionserver and got the results back in 600-800ms. After a while the performance of the GETs starts degrading. The same ~10K random gets start taking upwards of 9s-10s. With regards to hbase settings that I have modified, I have disabled major compaction, increase region size to 100G and bumped up the handler count to 100. I monitored ganglia for metrics that vary when the performance shifts from good to bad and found that the fsPreadLatency_avg_time is almost 25x in the bad performing regionserver. fsReadLatency_avg_time is also slightly higher but not that much (it's around 2x). I took a thread dump of the regionserver process and also did CPU utilization monitoring. The CPU cycles were being spent on org.apache.hadoop.hdfs.BlockReaderLocal.read and stack trace for threads running that function is below this email. Any pointers on why positional reads degrade over time ? Or is this just an issue of disk I/O and I should start looking into that ? Thanks, Viral ====stacktrace for one of the handler doing blockread==== "IPC Server handler 98 on 60020" - Thread t@147 java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:220) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:324) - locked <3215ed96> (a org.apache.hadoop.hdfs.BlockReaderLocal) at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384) at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1763) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2333) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2400) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1363) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1799) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1643) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:338) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543) - locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411) - locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3643) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3578) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3561) - locked <74d81ea7> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3599) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4407) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4380) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2039) at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)