I want to make sure a few of my understanding is correct in this thread. There are two ways to read a parquet file in C++, either through ParquetFile/read_table, or through ParquetDataset. For the former, the parallelism is per column because read_table simply passes all row groups indices to DecodeRowGroups in reader.cc, and there is no row group level parallelism. For the latter, the parallelism is per column and per row group, which is a ColumnChunk, according to RowGroupGenerator in file_parquet.cc. The difference between the former and the latter is also differentiated by use_legacy_dataset in Python. If my understanding is correct, I think this difference may be better explained in doc to avoid confusion. I have to crush the code to understand.
I was also wondering how pre_buffer works. Will coalescing ColumnChunk ranges hurt parallelism? Or you can still parallelly read a huge range after coalescing? To me, coalescing and parallel reading seem like a tradeoff on S3? Thanks in advance
