[Java][Parquet] Bulk Read Performance

Paulo Motta Sat, 01 Jul 2023 18:08:57 -0700

Hi,

I'm trying to read 4096 parquet files with a total size of 6GB using this
cookbook:
https://arrow.apache.org/cookbook/java/dataset.html#query-parquet-file


I'm using 100 threads, each thread processing one file at a time on a 72
core machine with 32GB heap. The files are pre-loaded in memory.

However it's taking about 10 minutes to process these 4096 files with a
total size of only 6GB and the process seems to be cpu-bound.

Is this expected read performance for parquet files or am I doing something
wrong? Any help or tips would be appreciated.

Thanks,

Paulo

[Java][Parquet] Bulk Read Performance

Reply via email to