[ 
https://issues.apache.org/jira/browse/HBASE-27896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault resolved HBASE-27896.
---------------------------------------
    Fix Version/s: 2.6.0
                   3.0.0-beta-1
     Release Note: PREAD reads will no longer do HDFS readahead by default. 
This should save substantial disk and network IO for random read workloads, but 
one can re-enable it if desired by setting "hbase.store.reader.no-readahead" to 
false.
         Assignee: Bryan Beaudreault
       Resolution: Fixed

Pushed to branch-2, branch-3, and master. Thanks everyone for chiming in and 
[~reidchan] for the review.

See [https://lists.apache.org/thread/pokw1bwtr26hdbmlmx4tx1g1fczqrtxt] for the 
discussion thread.

> Disable hdfs readahead for pread reads
> --------------------------------------
>
>                 Key: HBASE-27896
>                 URL: https://issues.apache.org/jira/browse/HBASE-27896
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>             Fix For: 2.6.0, 3.0.0-beta-1
>
>
> In https://issues.apache.org/jira/browse/HBASE-17914, a flag was introduced 
> {{{}hbase.store.reader.no-readahead{}}}. The default is false, so readahead 
> is enabled. This flag is used for creating the default store reader (i.e. the 
> one used by PREAD reads). Stream readers don't use this flag, instead they 
> always pass -1.
> When that flag is true, we pass a readahead value of 0 to 
> FSDataInputStream.setReadahead. When the flag is false, we pass -1 which 
> triggers hdfs default behavior. The default behavior is to use a readahead of 
> 4MB.
> It seems to me that we don't want readahead for PREAD reads, and especially 
> not such a large readahead. Our default block size is 64kb, which is much 
> smaller than that. A PREAD read is not likely to do sequential IO, so not 
> likely to utilize the cached readahead buffer.
> I set no-readahead to true in a few of our clusters and in each case saw a 
> massive reduction in disk IO and thus increase in throughput. I load tested 
> this in a test cluster which does fully random reads of ~300 byte rows on a 
> dataset which is 20x larger than memory. The load test was able to achieve 
> nearly double the throughput.
> As a follow-on, we might consider tuning the readahead for STREAM reads. 4mb 
> seems way too big for many common workloads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to