[ https://issues.apache.org/jira/browse/HBASE-27896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bryan Beaudreault resolved HBASE-27896. --------------------------------------- Fix Version/s: 2.6.0 3.0.0-beta-1 Release Note: PREAD reads will no longer do HDFS readahead by default. This should save substantial disk and network IO for random read workloads, but one can re-enable it if desired by setting "hbase.store.reader.no-readahead" to false. Assignee: Bryan Beaudreault Resolution: Fixed Pushed to branch-2, branch-3, and master. Thanks everyone for chiming in and [~reidchan] for the review. See [https://lists.apache.org/thread/pokw1bwtr26hdbmlmx4tx1g1fczqrtxt] for the discussion thread. > Disable hdfs readahead for pread reads > -------------------------------------- > > Key: HBASE-27896 > URL: https://issues.apache.org/jira/browse/HBASE-27896 > Project: HBase > Issue Type: Improvement > Reporter: Bryan Beaudreault > Assignee: Bryan Beaudreault > Priority: Major > Fix For: 2.6.0, 3.0.0-beta-1 > > > In https://issues.apache.org/jira/browse/HBASE-17914, a flag was introduced > {{{}hbase.store.reader.no-readahead{}}}. The default is false, so readahead > is enabled. This flag is used for creating the default store reader (i.e. the > one used by PREAD reads). Stream readers don't use this flag, instead they > always pass -1. > When that flag is true, we pass a readahead value of 0 to > FSDataInputStream.setReadahead. When the flag is false, we pass -1 which > triggers hdfs default behavior. The default behavior is to use a readahead of > 4MB. > It seems to me that we don't want readahead for PREAD reads, and especially > not such a large readahead. Our default block size is 64kb, which is much > smaller than that. A PREAD read is not likely to do sequential IO, so not > likely to utilize the cached readahead buffer. > I set no-readahead to true in a few of our clusters and in each case saw a > massive reduction in disk IO and thus increase in throughput. I load tested > this in a test cluster which does fully random reads of ~300 byte rows on a > dataset which is 20x larger than memory. The load test was able to achieve > nearly double the throughput. > As a follow-on, we might consider tuning the readahead for STREAM reads. 4mb > seems way too big for many common workloads. -- This message was sent by Atlassian Jira (v8.20.10#820010)