Re: [C++] Parquet file read from s3

Surya Kiran Gullapalli Thu, 28 Nov 2024 05:18:39 -0800

Thanks for the quick response.
When the file sizes are small (less than 10MB), I'm not seeing much
difference (not noticeable). But beyond that I'm seeing difference. I'll
send a snippet in due course.


Surya

On Thu, Nov 28, 2024 at 6:37 PM Raúl Cumplido <[email protected]>
wrote:

> Thanks for raising the issue.
>
> Could you share a snippet of the code you are using on how are you reading
> the file?
> Is your decrease on performance also happening with different file-sizes
> or is the file-size related to your issue?
>
> Thanks,
>
> Raúl
>
> El jue, 28 nov 2024, 13:58, Surya Kiran Gullapalli <
> [email protected]> escribió:
>
>> Hello all,
>> Trying to read a parquet file from s3 (50MB file) and it is taking much
>> more time than arrow 12.0.1. I've enabled threads (use_threads=true) and
>> batch size is set to 1024*1024. Also set the IOThreadPoolCapacity to 32.
>>
>> When I time the parquet read from s3 using boost timer shows cpu usage
>> for file read is 2-5%. And I think multithreaded reading was not happening.
>>
>> Reading same parquet file from local disk is fine. And reading the same
>> parquet file from s3 using arrow 12 is also fine. Am I missing any setting
>> related to reading parquet with threads or any aws setting ?
>>
>> This is the setting:
>> C++
>> Apache arrow 16.1
>> Ubuntu linux 22.04
>> gcc-13.2
>>
>> Thanks,
>> Surya
>>
>>
>>

Re: [C++] Parquet file read from s3

Reply via email to