Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-15 Thread Mason Chen
Hi Gordon, Thanks for your feedback as always. Why not just Map? I think it makes sense to relocate the bootstrapServer field in the Kafka properties (ClusterMetadata abstraction), since that is intuitive based on how it is defined in the Kafka clients library. It also makes the uniqueness of a

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-14 Thread Tzu-Li (Gordon) Tai
Hi Mason, Thanks for addressing my comments. I agree that option 3 seems more reasonable. > Reorganize the metadata in a Map in `KafkaStream` where the String is the proposed `KafkaClusterIdentifier.name` field. Why not just Map? Regarding naming, I like DynamicKafkaSource as that's what I

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-12 Thread Mason Chen
> > My main worry for doing this as a later iteration is that this would > probably be a breaking change for the public interface. If that can be > avoided and planned ahead, I'm fine with moving forward with how it is > right now. Make sense. Considering the public interfaces, I think we still

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-11 Thread Thomas Weise
Hi Mason, Thanks for the iterations on the FLIP, I think this is in a very good shape now. Small correction for the MultiClusterKafkaSourceEnumerator section: "This reader is responsible for discovering and assigning splits from 1+ cluster" Regarding the user facing name of the connector: I

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-09 Thread Tzu-Li (Gordon) Tai
> Regarding (2), definitely. This is something we planned to add later on but so far keeping things common has been working well. My main worry for doing this as a later iteration is that this would probably be a breaking change for the public interface. If that can be avoided and planned ahead,

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-07 Thread Mason Chen
Hi Gordon, Thanks for taking a look! Regarding (1), there is a need from the readers to send this event at startup because the reader state may reflect outdated metadata. Thus, the reader should not start without fresh metadata. With fresh metadata, the reader can filter splits from state--this

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-07 Thread Tzu-Li (Gordon) Tai
Hi Mason, Thanks for updating the FLIP. In principle, I believe this would be a useful addition. Some comments so far: 1. In this sequence diagram [1], why is there a need for a GetMetadataUpdateEvent from the MultiClusterSourceReader going to the MultiClusterSourceEnumerator? Shouldn't the

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-07 Thread Mason Chen
Hi Jing, Thanks for the prompt feedback! I had some confusion with how to resize images in confluence--anyways, I have made the font bigger, added white background, and also made the diagrams themselves bigger. Regarding the exactly once semantics, that's definitely good to point out in the doc.

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-06 Thread Jing Ge
Hi Mason, It is a very practical feature that many users are keen to use. Thanks to the previous discussion, the FLIP now looks informative. Thanks for your proposal. One small suggestion is that the attached images are quite small to read if we don't click and enlarge them. Besides that, It is

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2023-06-05 Thread Mason Chen
Hi all, I'm working on FLIP-246 again, for the Multi Cluster Kafka Source contribution. The document has been updated with some more context about how it can solve the Kafka topic removal scenario and a sequence diagram to illustrate how the components interact. Looking forward to any feedback!

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-10-13 Thread Mason Chen
Hi Ryan, Thanks for the additional context! Yes, the offset initializer would need to take a cluster as a parameter and the MultiClusterKafkaSourceSplit can be exposed in an initializer. Best, Mason On Thu, Oct 6, 2022 at 11:00 AM Ryan van Huuksloot < ryan.vanhuuksl...@shopify.com> wrote: > Hi

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-10-06 Thread Ryan van Huuksloot
Hi Mason, Thanks for the clarification! In regards to the addition to the OffsetInitializer of this API - this would be an awesome addition and I think this entire FLIP would be a great addition to the Flink. To provide more context as to why we need particular offsets, we use Hybrid Source to

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-10-03 Thread Mason Chen
Hi Ryan, Just copying your message over to the email chain. Hi Mason, > First off, thanks for putting this FLIP together! Sorry for the delay. > Full disclosure Mason and I chatted a little bit at Flink Forward 2022 but > I have tried to capture the questions I had for him then. > I'll start the

RE: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-09-04 Thread Ryan van Huuksloot
Hi Mason, First off, thanks for putting this FLIP together! Sorry for the delay. Full disclosure Mason and I chatted a little bit at Flink Forward 2022 but I have tried to capture the questions I had for him then. I'll start the conversation with a few questions: 1. The concept of streamIds is

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-08-29 Thread Mason Chen
Hi Max, Thanks for taking a look! > I'm wondering whether we can share some of the code of the existing > KafkaSource. > That is the intention--let me call it out more explicitly. Regarding your questions: 1. Indeed, the KafkaMetadataService has the describe stream method to get a particular

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-08-17 Thread Maximilian Michels
Hey Mason, I just had a look at the FLIP. If I understand correctly, you are proposing a very sophisticated way to read from multiple Kafka clusters / topics. I'm wondering whether we can share some of the code of the existing KafkaSource. I suppose you don't want to modify KafkaSource

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-08-11 Thread Mason Chen
5. At startup, GetMetadataUpdateEvent is also used to allow the MultiClusterKafkaSourceReader to get the latest metadata from the enumerator to filter out invalid splits This is how the reader can solve "removing" splits/topics in the startup case. Sorry for the late response, really appreciate

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-08-11 Thread Mason Chen
Hi Qingsheng, Thanks for the feedback--these are great points to raise. 1. This is something I missed that is now added. More generally, it can locate multiple topics in multiple clusters (1 topic on 1 cluster is the simplest case). 2. The KafkaMetadataService doesn't interact with the

Re: [DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-07-29 Thread Qingsheng Ren
Hi Mason, Thank you for starting this FLIP! From my first glance this FLIP looks like a collection of many new interfaces, but I can’t stitch them together. It’ll be great to have some brief descriptions about how the source works internally. Here are some questions in my mind and please

[DISCUSS] FLIP-246: Multi Cluster Kafka Source

2022-07-20 Thread Mason Chen
Hi all, We would like to start a discussion thread on FLIP-246: Multi Cluster Kafka Source [1] where we propose to provide a source connector for dynamically reading from Kafka multiple clusters, which will not require Flink job restart. This can greatly improve the Kafka migration experience for