Steve Loughran created HADOOP-18852:
---------------------------------------

             Summary: S3ACachingInputStream.ensureCurrentBuffer(): lazy seek 
means all reads look like random IO
                 Key: HADOOP-18852
                 URL: https://issues.apache.org/jira/browse/HADOOP-18852
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 3.3.6
            Reporter: Steve Loughran


noticed in HADOOP-18184, but I think it's a big enough issue to be dealt with 
separately.

# all seeks are lazy; no fetching is kicked off after an open
# the first read is treated as an out of order read, so cancels any active 
reads (don't think there are any) and then only asks for 1 block

{code}
    if (outOfOrderRead) {
      LOG.debug("lazy-seek({})", getOffsetStr(readPos));
      blockManager.cancelPrefetches();

      // We prefetch only 1 block immediately after a seek operation.
      prefetchCount = 1;
    }

{code}

* for any read fully we should prefetch all blocks in the range requested
* for other reads, we may want a bigger prefech count than 1, depending on: 
split start/end, file read policy (random, sequential, whole-file)
* also, if a read is in a block other than the current one, but which is 
already being fetched or cached, is this really an OOO read to the extent that 
outstanding fetches should be cancelled?





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to