[GitHub] [hudi] nsivabalan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-10-13 Thread GitBox
nsivabalan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-941946294 @FelixKJose : If you are interested in working on a fix, I have filed a tracking jira https://issues.apache.org/jira/browse/HUDI-2550 -- This is an automated message from the

[GitHub] [hudi] nsivabalan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-10-12 Thread GitBox
nsivabalan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-941946294 @FelixKJose : If you are interested in working on a fix, I have filed a tracking jira https://issues.apache.org/jira/browse/HUDI-2550 -- This is an automated message from the

[GitHub] [hudi] nsivabalan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-10-07 Thread GitBox
nsivabalan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-937427922 guess, we don't have clear documentation around this. I myself had to dig through the code and tried it myself before confirming some of the nuance behaviors. -- This is an

[GitHub] [hudi] nsivabalan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-10-06 Thread GitBox
nsivabalan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-937427922 guess, we don't have clear documentation around this. I myself had to dig through the code and tried it myself before confirming some of the nuance behaviors. -- This is an

[GitHub] [hudi] nsivabalan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-09-28 Thread GitBox
nsivabalan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-929282472 @FelixKJose : what I meant is, you are good w/ your configs in general. just that for every commit only one small file will be packed w/ more inserts. rest of incoming records will

[GitHub] [hudi] nsivabalan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-09-28 Thread GitBox
nsivabalan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-929282472 @FelixKJose : what I meant is, you are good w/ your configs in general. just that for every commit only one small file will be packed w/ more inserts. rest of incoming records will

[GitHub] [hudi] nsivabalan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-09-19 Thread GitBox
nsivabalan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-922508543 I found the rootcause. Looks like in MOR, when an index is used which cannot index log files (which is the case for all out of box indexes in hudi), we just choose the smallest

[GitHub] [hudi] nsivabalan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-09-18 Thread GitBox
nsivabalan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-922375568 I see you have very aggressive cleaner and archival commits to retain. Can you leave it to default and try it out. ``` 'hoodie.cleaner.commits.retained': 1,