[C++] Parquet file read from s3

Surya Kiran Gullapalli Thu, 28 Nov 2024 04:58:44 -0800

Hello all,
Trying to read a parquet file from s3 (50MB file) and it is taking much
more time than arrow 12.0.1. I've enabled threads (use_threads=true) and
batch size is set to 1024*1024. Also set the IOThreadPoolCapacity to 32.


When I time the parquet read from s3 using boost timer shows cpu usage for
file read is 2-5%. And I think multithreaded reading was not happening.

Reading same parquet file from local disk is fine. And reading the same
parquet file from s3 using arrow 12 is also fine. Am I missing any setting
related to reading parquet with threads or any aws setting ?

This is the setting:
C++
Apache arrow 16.1
Ubuntu linux 22.04
gcc-13.2

Thanks,
Surya

[C++] Parquet file read from s3

Reply via email to