VitoMakarevich commented on issue #10997:
URL: https://github.com/apache/hudi/issues/10997#issuecomment-2058398743
Thanks! Yeah, we are certain we run clustering for all partitions which are
big enough, it's a big effort to analyze data for optimal clustering settings,
that's why I'm asking
xushiyan commented on issue #10997:
URL: https://github.com/apache/hudi/issues/10997#issuecomment-2058238697
> we have clustering to group rows together, but it's still thousands of
files affected. 75th percentile of individual file overwrite(task in the Doing
partition and writing data sta
VitoMakarevich commented on issue #10997:
URL: https://github.com/apache/hudi/issues/10997#issuecomment-2056274777
Hello, thanks for the suggestions! As I said, I'd like to know how I can
speed up this individual part, I know it's option to use MOR in theory, but
it's impossible for our use
xushiyan commented on issue #10997:
URL: https://github.com/apache/hudi/issues/10997#issuecomment-209283
+1 to use MOR to balance the ingestion speed and merge cost through
compaction. There is also a new feature in 0.13.x
https://hudi.apache.org/releases/release-0.13.0#simple-write-exe
ad1happy2go commented on issue #10997:
URL: https://github.com/apache/hudi/issues/10997#issuecomment-2054086162
@VitoMakarevich Just checking if you have lots of file groups impacted in
each batch, then why not use MERGE_ON_READ table.
In your current setup, you can only try to optimize