[GitHub] [hudi] jiegzhan commented on issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-25 Thread GitBox
jiegzhan commented on issue #1980: URL: https://github.com/apache/hudi/issues/1980#issuecomment-680303027 Thanks for your explanation, @bvaradar. Closed this ticket. This is an automated message from the Apache Git Service.

[GitHub] [hudi] jiegzhan commented on issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-24 Thread GitBox
jiegzhan commented on issue #1980: URL: https://github.com/apache/hudi/issues/1980#issuecomment-679398095 @bvaradar, before re-clustering is available, I tested [hoodie.cleaner.commits.retained](https://hudi.apache.org/docs/configurations.html#retainCommits). I set option("hoodie.clean

[GitHub] [hudi] jiegzhan commented on issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-19 Thread GitBox
jiegzhan commented on issue #1980: URL: https://github.com/apache/hudi/issues/1980#issuecomment-676688579 @bvaradar that makes sense, thanks. After ran many delete queries, I got a lot small files in S3. Is there a way to merge these small files? Basically I am trying to clean up S3 folder

[GitHub] [hudi] jiegzhan commented on issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-19 Thread GitBox
jiegzhan commented on issue #1980: URL: https://github.com/apache/hudi/issues/1980#issuecomment-676544068 @bvaradar What is the size of new version of the same files after running delete query? For me, they are 423KB. Step 1: ran bulk_insert query: ``` df. write.format("o