Hi Siva,
Reg the ability to specify distribution, sorting, can they be dynamic? Not
just at table creation time.
Hudi is really a storage system. i.e has a specific layout of data with
multiple tables (ro,rt) exposed.
So all of these "file" management APIs, tend to fit poorly at times.
To your
I don't have much knowledge wrt catalog, but is there an option of
exploring spark catalog based table to create a hudi table? I do know with
spark3.2, you can add Distribution(a.k.a partitioning) and Sort order to
your table. But still not sure on custom transformation for indexing, etc.
Also,
Folks,
As you may know, we still use the V1 API, given it the flexibility further
transform the dataframe, after one calls `df.write.format()`, to implement
a fully featured write pipeline with precombining, indexing, custom
partitioning. V2 API takes this away and rather provides a very