[GitHub] [hudi] nsivabalan commented on issue #4242: [SUPPORT] Split Data into Multiple Parquet files under Partitions

2021-12-10 Thread GitBox
nsivabalan commented on issue #4242: URL: https://github.com/apache/hudi/issues/4242#issuecomment-991051340 @codope : can you chime in here please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] nsivabalan commented on issue #4242: [SUPPORT] Split Data into Multiple Parquet files under Partitions

2021-12-09 Thread GitBox
nsivabalan commented on issue #4242: URL: https://github.com/apache/hudi/issues/4242#issuecomment-990569759 yes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [hudi] nsivabalan commented on issue #4242: [SUPPORT] Split Data into Multiple Parquet files under Partitions

2021-12-09 Thread GitBox
nsivabalan commented on issue #4242: URL: https://github.com/apache/hudi/issues/4242#issuecomment-990449113 But you can employ clustering to batch lot of small files together if you end up with lot of small files. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] nsivabalan commented on issue #4242: [SUPPORT] Split Data into Multiple Parquet files under Partitions

2021-12-09 Thread GitBox
nsivabalan commented on issue #4242: URL: https://github.com/apache/hudi/issues/4242#issuecomment-990448651 Hudi has max parquet file size configs that you can leverage. https://hudi.apache.org/docs/configurations/#hoodieparquetmaxfilesize But also, do keep in mind that, reducing this