Hello, I am currently writing some distributed code where I am reading Parquet columns from the same file across multiple processes. I see that https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet5arrow10FileReaderEseems to suggest that parallelism within a process would need to read at the row group granularity and that multiple file readers working independently on the same file in a single process would not be safe.
Given that I haven’t been able to find anything suggesting the contrary, I was thinking that reading the same file from different processes would be allowed, but a recent crash I encountered made me question if that were true. Is it allowed to read a single Parquet file simultaneously from separate processes? I am currently using the low level `ReadBatch` API and, for example, if I were reading 1 file across 2 processes, I would have the first process read the first half of the elements and the second process read the second half of the elements, and both of these are happening simultaneously, but as I have mentioned, it is in different processes, so I wouldn’t expect there to be any conflict. So far, this code has worked as expected and I have been able to read in multiple files simultaneously across processes, but recently I hit a case where reading a file in a single process resulted in a error that could be handled gracefully (with an `Unexpected end of stream` error), but reading in that same file across multiple processes crashed the code, and I would like to be able to handle the errors rather than having it crash. Thanks. Best, Ben McDonald
