Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-17 Thread Yuxin Tan
Hi, Venkatakrishnan The current proposal is designed to initialize a single remote tier, such as Celeborn, if the remote tier factory is configured. However, this is a temporary approach because the tier interfaces are not stable. Our aim is to eventually support multiple storage tiers: memory

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-17 Thread Venkatakrishnan Sowrirajan
Yuxin, One question, in the current proposal is it limited to only one tiered storage implementation for eg: celeborn? Is it possible to have multiple tiered storages like a separate cloud storage and Celeborn or some such? On Thu, Jun 6, 2024, 9:46 AM Jeyhun Karimov wrote: > Hi Yuxin, > > +1

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-06 Thread Jeyhun Karimov
Hi Yuxin, +1 for this proposal. This change will greatly alleviate the pressure on local storage resources (especially when there is limited local storage) particularly in the context of cloud-native environments. Regards, Jeyhun On Thu, Jun 6, 2024 at 1:20 PM Yuxin Tan wrote: > Hi all, > >

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-06 Thread Yuxin Tan
Hi all, Thanks for all the feedback and suggestions so far. If there is no further comment, we will open the voting thread tomorrow. Best, Yuxin Yuxin Tan 于2024年6月6日周四 15:40写道: > Thanks Zhu for the suggestion. > I have updated the description of the option. > > Best, > Yuxin > > > Zhu Zhu

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-06 Thread Yuxin Tan
Thanks Zhu for the suggestion. I have updated the description of the option. Best, Yuxin Zhu Zhu 于2024年6月6日周四 14:59写道: > +1 > > Maybe explain in the description of > `taskmanager.network.hybrid-shuffle.external-remote-tier-factory.class` > that it only accepts Celeborn as the remote shuffle

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-06 Thread Zhu Zhu
+1 Maybe explain in the description of `taskmanager.network.hybrid-shuffle.external-remote-tier-factory.class` that it only accepts Celeborn as the remote shuffle tier at this moment? Thanks, Zhu Junrui Lee 于2024年6月6日周四 13:49写道: > Thanks Yuxin for your answer. +1 for this proposal. > > Best,

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-05 Thread Junrui Lee
Thanks Yuxin for your answer. +1 for this proposal. Best, Junrui. Yuxin Tan 于2024年6月6日周四 13:42写道: > Thanks Junrui for your question. > > > I wonder if the current interface design support the > future adaptation for batch job recovery > > I noticed that FLIP-383 supports batch job recovery by

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-05 Thread Yuxin Tan
Thanks Junrui for your question. > I wonder if the current interface design support the future adaptation for batch job recovery I noticed that FLIP-383 supports batch job recovery by introducing some new APIs. These APIs can also be added to the Tier-related interfaces to facilitate the

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-05 Thread Junrui Lee
Thanks Yuxin for driving this proposal! I have a question about the public interface compatibility in the context of FLIP-459. As we've supported batch job recovery from jobMaster failures in FLIP-383 which will be released in Flink 1.20. I wonder if the current interface design support the

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-04 Thread weijie guo
Thanks Yuxin for the proposal! When we first proposed Hybrid Shuffle, I wanted to support pluggable storage tier in the future. However, limited by the architecture of the legacy Hybrid Shuffle at that time, this idea has not been realized. The new architecture abstracts the tier nicely, and now

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-04 Thread rexxiong
Thanks Yuxin for the proposal. +1, as a member of the Apache Celeborn community, I am very excited about the integration of Flink's Hybrid Shuffle with Apache Celeborn. The whole design of CIP-6 looks good to me. I am looking forward to this integration. Thanks, Jiashu Xiong Ethan Feng

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-04 Thread Ethan Feng
+1 for this proposal. After internally reviewing the prototype of CIP-6, this would improve performance and stability for Flink users using Celeborn. Expect to see this feature come out to the community. As I come from the Celeborn community, I hope more users can try to use Celeborn when there

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-04 Thread Yuxin Tan
Hi, Venkatakrishnan, Thanks for joining the discussion. We appreciate your interest in contributing to the work. Once the FLIP and CIP proposals have been approved, we will create some JIRA tickets in Flink and Celeborn projects. Please feel free to take a look at the tickets and select any that

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-05-31 Thread Venkatakrishnan Sowrirajan
Thanks for this FLIP. We are also interested in learning/contributing to the hybrid shuffle integration with celeborn for batch executions. On Tue, May 28, 2024, 7:07 PM Yuxin Tan wrote: > Hi, Xintong, > > > I think we can also publish the prototype codes so the > community can better

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-05-28 Thread Yuxin Tan
Hi, Xintong, > I think we can also publish the prototype codes so the community can better understand and help with it. Ok, I agree on the point. I will prepare and publish the code recently. Rui, > Kindly reminder: the image of CIP-6[1] cannot be loaded. Thanks for the reminder. I've

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-05-28 Thread Rui Fan
Thanks Yuxin for driving this proposal! Kindly reminder: the image of CIP-6[1] cannot be loaded. [1] https://cwiki.apache.org/confluence/display/CELEBORN/CIP-6+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn Best, Rui On Wed, May 29, 2024 at 9:03 AM Xintong Song wrote: > +1 for

Re: [DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-05-28 Thread Xintong Song
+1 for this proposal. We have been prototyping this feature internally at Alibaba for a couple of months. Yuxin, I think we can also publish the prototype codes so the community can better understand and help with it. Best, Xintong On Tue, May 28, 2024 at 8:34 PM Yuxin Tan wrote: > Hi all,

[DISCUSS] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-05-28 Thread Yuxin Tan
Hi all, I would like to start a discussion on FLIP-459 Support Flink hybrid shuffle integration with Apache Celeborn[1]. Flink hybrid shuffle supports transitions between memory, disk, and remote storage to improve performance and job stability. Concurrently, Apache Celeborn provides a stable,