hi Shawn, I expect this is the default because Parquet comes from the Hadoop ecosystem, and the Hadoop block size is usually set to 64MB. Why would you need a different default? You can set it to the size that fits your use case best, right?
Marnix On Tue, Feb 22, 2022 at 1:42 PM Shawn Zeng <[email protected]> wrote: > For a clarification, I am referring to pyarrow.parquet.write_table > > Shawn Zeng <[email protected]> 于2022年2月22日周二 20:40写道: > >> Hi, >> >> The default row_group_size is really large, which means a large table >> smaller than 64M rows will not get the benefits of row group level >> statistics. What is the reason for this? Do you plan to change the default? >> >> Thanks, >> Shawn >> >
