Steve Loughran created HADOOP-18852: ---------------------------------------
Summary: S3ACachingInputStream.ensureCurrentBuffer(): lazy seek means all reads look like random IO Key: HADOOP-18852 URL: https://issues.apache.org/jira/browse/HADOOP-18852 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.3.6 Reporter: Steve Loughran noticed in HADOOP-18184, but I think it's a big enough issue to be dealt with separately. # all seeks are lazy; no fetching is kicked off after an open # the first read is treated as an out of order read, so cancels any active reads (don't think there are any) and then only asks for 1 block {code} if (outOfOrderRead) { LOG.debug("lazy-seek({})", getOffsetStr(readPos)); blockManager.cancelPrefetches(); // We prefetch only 1 block immediately after a seek operation. prefetchCount = 1; } {code} * for any read fully we should prefetch all blocks in the range requested * for other reads, we may want a bigger prefech count than 1, depending on: split start/end, file read policy (random, sequential, whole-file) * also, if a read is in a block other than the current one, but which is already being fetched or cached, is this really an OOO read to the extent that outstanding fetches should be cancelled? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org