Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-27 Thread Vinoth Chandar
I left my thoughts on the RFC https://github.com/apache/hudi/pull/4309 I just see this as a another deployment model where a centralized set of microservices take up scheduling, execution of Hudi's table services. +1 on thinking about sharding,locking and HA upfront. Thanks Vinoth On Thu, Apr 2

Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-21 Thread Alexey Kudinkin
Hey, folks! I feel there's quite a bit of confusion in this thread, so let's try to clear it: my understanding (please correct me if I'm wrong) is that Lake Manager was referred to as a service in a similar interpretation of how we call compaction, clustering and cleaning a* table services.* So,

Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-18 Thread Yue Zhang
Thanks for all your attention. Sure, we do need to take care of high availability in design. Also in my opinion this lake manager wouldn't drive hudi into a database on the cloud. It is just an official option. Something like HoodieDeltaStreamer and help users to reduce maintenance and hudi data

Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-18 Thread Simon Su
> > I agree with Danny said. IMO, there are two points that should be > considered 1. If Lake Manager is designed as a service, so we should consider its High Availability, Dynamic Expanding/Shrinking, and state consistency. 2. How many resources will Lake Manager used to execute those actions of

Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-18 Thread Y Ethan Guo
In my point of view, this Lake Manager should be more like a centralized management layer on top of Hudi tables to schedule different table services and do data governance. The scheduling / managing part should be lightweight. The execution should still be in cluster. It should not be a single n

Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-18 Thread Danny Chan
I have different concerns here, the Lake Manager seems like a single node service here, and there is a risk that it becomes a bottleneck for handling too many table services. And for every single node service we should consider how to achieve high availability. What is the final state of the Hudi

Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-18 Thread Y Ethan Guo
+1 This is a great idea! The proposed lake manager and centralized management layer are essential to ease the burden of carrying out data governance and optimizing the storage layout, making them independent of ingestion and streaming. I see that this provides a better abstraction for any potentia

Re: [DISSCUSS][NEW FEATURE] Hudi Lake Manager

2022-04-18 Thread Shiyan Xu
Great idea, Zhang Yue! I see more potential collaborations in the work for the table management service in this RFC 43 https://github.com/apache/hudi/pull/4309 On Mon, Apr 18, 2022 at 2:15 PM Yue Zhang wrote: > > > Hi all, > I would like to discuss and contribute a new feature named Hudi Lak