Hi all,

I wonder how I can set the row group size for files generated by
ParquetIO.Sink
<https://beam.apache.org/releases/javadoc/2.20.0/org/apache/beam/sdk/io/parquet/ParquetIO.Sink.html>.
It doesn't seem to provide the option for setting that and IIUC from the
code
<https://github.com/apache/beam/blob/fffb85a35df6ae3bdb2934c077856f6b27559aa7/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L1117>,
it
uses the default value in ParquetWriter.Builder
<https://github.com/apache/parquet-mr/blob/bdf935a43bd377c8052840a4328cf5b7603aa70a/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetWriter.java#L636>
which
is 128MB. Is there any reason not to expose this parameter in ParquetIO?

Thanks

-B

Reply via email to