[jira] [Created] (DRILL-4977) Reading parquet metadata cache from S3 with fadvise=random and Hadoop 3 generates a large number of requests

Uwe L. Korn (JIRA) Fri, 28 Oct 2016 01:06:07 -0700

Uwe L. Korn created DRILL-4977:
----------------------------------

             Summary: Reading parquet metadata cache from S3 with 
fadvise=random and Hadoop 3 generates a large number of requests
                 Key: DRILL-4977
                 URL: https://issues.apache.org/jira/browse/DRILL-4977
             Project: Apache Drill
          Issue Type: Improvement
          Components: Storage - Parquet
    Affects Versions: 1.8.0
         Environment: Hadoop 3.0
            Reporter: Uwe L. Korn



When using the new {{fs.s3a.experimental.input.fadvise=random}} mode for 
accessing Parquet files stored in S3, we see a significant improvement for the 
query performance but a slowdown on query planning. This is due to the way the 
metadata file is read (each chunk of 8000 bytes generates a new GET request to 
S3). Indicating with {{FSDataInputStream.setReadahead(metadata-filesize)}} that 
we will read the whole file, this behaviour is circumvented. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-4977) Reading parquet metadata cache from S3 with fadvise=random and Hadoop 3 generates a large number of requests

Reply via email to