Uwe L. Korn created DRILL-4977: ---------------------------------- Summary: Reading parquet metadata cache from S3 with fadvise=random and Hadoop 3 generates a large number of requests Key: DRILL-4977 URL: https://issues.apache.org/jira/browse/DRILL-4977 Project: Apache Drill Issue Type: Improvement Components: Storage - Parquet Affects Versions: 1.8.0 Environment: Hadoop 3.0 Reporter: Uwe L. Korn
When using the new {{fs.s3a.experimental.input.fadvise=random}} mode for accessing Parquet files stored in S3, we see a significant improvement for the query performance but a slowdown on query planning. This is due to the way the metadata file is read (each chunk of 8000 bytes generates a new GET request to S3). Indicating with {{FSDataInputStream.setReadahead(metadata-filesize)}} that we will read the whole file, this behaviour is circumvented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)