Re: [DISCUSSION] Simplify code structure for supporting multiple Spark versions in Hudi

2023-06-02 Thread Y Ethan Guo
Hey Shawn, Rahil, Thanks for raising this issue. These are good suggestions; I would recommend simplifying the code structure of Hudi Spark incrementally and gradually making the code less coupled with Spark engine. Identify breaking changes introduced by the new Spark version and patch >

Re: [DISCUSSION] Simplify code structure for supporting multiple Spark versions in Hudi

2023-06-02 Thread Vinoth Chandar
This is a good topic, thanks for raising this. Overall our reliance on spark classes/APIs that are declared experimental is an issue on paper. But there is few other ways to get right performance without relying on these. This has been the tricky issue IMO. Thoughts? I ll review the code

Re: [DISCUSSION] Simplify code structure for supporting multiple Spark versions in Hudi

2023-06-01 Thread Rahil C
Thanks Shawn for writing this, I would like to also add on to the Spark Discussion. Currently I think our integration with Spark is too tight, and brings up serious issues when upgrading. I will describe one example(however there are many more) but one area is we extend Spark's

[DISCUSSION] Simplify code structure for supporting multiple Spark versions in Hudi

2023-05-23 Thread Shawn Chang
Hi Hudi developers, I am writing to discuss the current code structure of the existing hudi-spark-datasource and propose a more scalable approach for supporting multiple Spark versions. The current structure involves common code shared by several Spark versions, such as hudi-spark-common,