[ 
https://issues.apache.org/jira/browse/IMPALA-8525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar resolved IMPALA-8525.
----------------------------------
    Fix Version/s: Impala 3.4.0
       Resolution: Fixed

Done. Did some additional benchmarking to confirm the expect perf improvement. 
For the following query {{select * from tpcds_parquet.inventory order by 
inv_quantity_on_hand limit 10}} on a 10 TB TPC-DS dataset on S3, this change 
improves performance by 70% (12.06s to 5.84s).

The expected performance improvement is dependent on the workload, but is 
generally a function of the Parquet file size and the amount of sequential data 
scanned from the file. If the Parquet files are small (e.g. less than the chunk 
size (128K) then this change doesn't make a big difference). For larger Parquet 
files, especially for large scan ranges, this change makes a significant 
difference.

> preads should use hdfsPreadFully rather than hdfsPread
> ------------------------------------------------------
>
>                 Key: IMPALA-8525
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8525
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>             Fix For: Impala 3.4.0
>
>
> Impala preads (only enabled if {{use_hdfs_pread}} is true) use the 
> {{hdfsPread}} API from libhdfs, which ultimately invokes 
> {{PositionedReadable#read(long position, byte[] buffer, int offset, int 
> length)}} in the HDFS-client.
> {{PositionedReadable}} also exposes the method {{readFully(long position, 
> byte[] buffer, int offset, int length)}}. The difference is that {{#read}} 
> will "Read up to the specified number of bytes" whereas {{#readFully}} will 
> "Read the specified number of bytes". So there is no guarantee that {{#read}} 
> will read *all* of the request bytes.
> Impala calls {{hdfsPread}} inside {{hdfs-file-reader.cc}} and invokes it 
> inside a while loop until all the requested bytes have been read from the 
> file. This can cause a few performance issues:
> (1) if the underlying {{FileSystem}} does not support ByteBuffer reads 
> (HDFS-2834) (e.g. S3A does not support this feature) then {{hdfsPread}} will 
> allocate a Java array equal in size to specified length of the buffer; the 
> call to {{PositionedReadable#read}} may only fill up the buffer partially; 
> Impala will repeat the call to {{hdfsPread}} since the buffer was not filled, 
> which will cause another large array allocation; this can result in a lot of 
> wasted time doing unnecessary array allocations
> (2) given that Impala calls {{hdfsPread}} in a while loop, there is no point 
> in continuously calling {{hdfsPread}} when a single call to 
> {{hdfsPreadFully}} will achieve the same thing (this doesn't actually affect 
> performance much, but is unnecessary)
> Prior solutions to this problem have been to introduce a "chunk-size" to 
> Impala reads (https://gerrit.cloudera.org/#/c/63/ - S3: DiskIoMgr related 
> changes for S3). However, with the migration to {{hdfsPreadFully}} the 
> chunk-size is no longer necessary.
> Furthermore, preads are most effective when the data is read all at once 
> (e.g. in 8 MB chunks as specified by {{read_size}}) rather than in smaller 
> chunks (typically 128K). For example, {{DFSInputStream#read(long position, 
> byte[] buffer, int offset, int length)}} opens up remote block readers with a 
> byte range determined by the value of {{length}} passed into the {{#read}} 
> call. Similarly, {{S3AInputStream#readFully}} will issue an HTTP GET request 
> with the size of the read specified by the given {{length}} (although fadvise 
> must be set to RANDOM for this to work).
> This work is dependent on exposing {{readFully}} via libhdfs first: HDFS-14564



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to