subject:"\[GitHub\] \[hudi\] bvaradar commented on issue #1939\: \[SUPPORT\] Hudi creating parquet with huge size and not in sink with limitFileSize"

[GitHub] [hudi] bvaradar commented on issue #1939: [SUPPORT] Hudi creating parquet with huge size and not in sink with limitFileSize

2020-09-13 Thread GitBox

bvaradar commented on issue #1939: URL: https://github.com/apache/hudi/issues/1939#issuecomment-691739964 @RajasekarSribalan : Please reopen if you still have any questions. Thanks, Balaji.V This is an automated

[GitHub] [hudi] bvaradar commented on issue #1939: [SUPPORT] Hudi creating parquet with huge size and not in sink with limitFileSize

2020-08-21 Thread GitBox

bvaradar commented on issue #1939: URL: https://github.com/apache/hudi/issues/1939#issuecomment-678220122 Sorry for the delay in responding , here is the default storage level config I am seeing, private static final String WRITE_STATUS_STORAGE_LEVEL =

[GitHub] [hudi] bvaradar commented on issue #1939: [SUPPORT] Hudi creating parquet with huge size and not in sink with limitFileSize

2020-08-10 Thread GitBox

bvaradar commented on issue #1939: URL: https://github.com/apache/hudi/issues/1939#issuecomment-671690639 To understand, Are you using bulk insert for initial loading and upsert for subsequent operations ? For records with LOBs, it is important to tune

[GitHub] [hudi] bvaradar commented on issue #1939: [SUPPORT] Hudi creating parquet with huge size and not in sink with limitFileSize

2020-08-09 Thread GitBox

bvaradar commented on issue #1939: URL: https://github.com/apache/hudi/issues/1939#issuecomment-671079032 Regarding OOM errors, please check if which Spark stage is causing the failure. You might need to tune parallelism for this. The size of parquet files should not be the issue.