Thanks for the quick response. When the file sizes are small (less than 10MB), I'm not seeing much difference (not noticeable). But beyond that I'm seeing difference. I'll send a snippet in due course.
Surya On Thu, Nov 28, 2024 at 6:37 PM Raúl Cumplido <[email protected]> wrote: > Thanks for raising the issue. > > Could you share a snippet of the code you are using on how are you reading > the file? > Is your decrease on performance also happening with different file-sizes > or is the file-size related to your issue? > > Thanks, > > Raúl > > El jue, 28 nov 2024, 13:58, Surya Kiran Gullapalli < > [email protected]> escribió: > >> Hello all, >> Trying to read a parquet file from s3 (50MB file) and it is taking much >> more time than arrow 12.0.1. I've enabled threads (use_threads=true) and >> batch size is set to 1024*1024. Also set the IOThreadPoolCapacity to 32. >> >> When I time the parquet read from s3 using boost timer shows cpu usage >> for file read is 2-5%. And I think multithreaded reading was not happening. >> >> Reading same parquet file from local disk is fine. And reading the same >> parquet file from s3 using arrow 12 is also fine. Am I missing any setting >> related to reading parquet with threads or any aws setting ? >> >> This is the setting: >> C++ >> Apache arrow 16.1 >> Ubuntu linux 22.04 >> gcc-13.2 >> >> Thanks, >> Surya >> >> >>
