Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Jingsong Li
I see, I understand what you mean is avoiding the loss of historical data Logically, another option is never clean up, so don't have to turn on compact I am OK with the implementation, It's that feeling shouldn't be a logical limitation Best, Jingsong On Fri, Oct 23, 2020 at 4:09 PM Kurt Young

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Kurt Young
To be precise, it means the Kakfa topic should set the configuration "cleanup.policy" to "compact" not "delete". Best, Kurt On Fri, Oct 23, 2020 at 4:01 PM Jingsong Li wrote: > I just notice there is a limitation in the FLIP: > > > Generally speaking, the underlying topic of the upsert-kafka s

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Jingsong Li
I just notice there is a limitation in the FLIP: > Generally speaking, the underlying topic of the upsert-kafka source must be compacted. Besides, the underlying topic must have all the data with the same key in the same partition, otherwise, the result will be wrong. According to my understandin

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Timo Walther
+1 for voting Regards, Timo On 23.10.20 09:07, Jark Wu wrote: Thanks Shengkai! +1 to start voting. Best, Jark On Fri, 23 Oct 2020 at 15:02, Shengkai Fang wrote: Add one more message, I have already updated the FLIP[1]. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+In

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Jark Wu
Thanks Shengkai! +1 to start voting. Best, Jark On Fri, 23 Oct 2020 at 15:02, Shengkai Fang wrote: > Add one more message, I have already updated the FLIP[1]. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector > > Shengkai Fang 于2020

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-23 Thread Shengkai Fang
Add one more message, I have already updated the FLIP[1]. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector Shengkai Fang 于2020年10月23日周五 下午2:55写道: > Hi, all. > It seems we have reached a consensus on the FLIP. If no one has other > objections

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Shengkai Fang
Hi, all. It seems we have reached a consensus on the FLIP. If no one has other objections, I would like to start the vote for FLIP-149. Best, Shengkai Jingsong Li 于2020年10月23日周五 下午2:25写道: > Thanks for explanation, > > I am OK for `upsert`. Yes, Its concept has been accepted by many systems. > >

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Jingsong Li
Thanks for explanation, I am OK for `upsert`. Yes, Its concept has been accepted by many systems. Best, Jingsong On Fri, Oct 23, 2020 at 12:38 PM Jark Wu wrote: > Hi Timo, > > I have some concerns about `kafka-cdc`, > 1) cdc is an abbreviation of Change Data Capture which is commonly used for

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Jark Wu
Hi Timo, I have some concerns about `kafka-cdc`, 1) cdc is an abbreviation of Change Data Capture which is commonly used for databases, not for message queues. 2) usually, cdc produces full content of changelog, including UPDATE_BEFORE, however "upsert kafka" doesn't 3) `kafka-cdc` sounds like a n

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Jingsong Li
The `kafka-cdc` looks good to me. We can even give options to indicate whether to turn on compact, because compact is just an optimization? - ktable let me think about KSQL. - kafka-compacted it is not just compacted, more than that, it still has the ability of CDC - upsert-kafka , upsert is back,

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Timo Walther
Hi Jark, I would be fine with `connector=upsert-kafka`. Another idea would be to align the name to other available Flink connectors [1]: `connector=kafka-cdc`. Regards, Timo [1] https://github.com/ververica/flink-cdc-connectors On 22.10.20 17:17, Jark Wu wrote: Another name is "connector=u

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Jark Wu
Another name is "connector=upsert-kafka', I think this can solve Timo's concern on the "compacted" word. Materialize also uses "ENVELOPE UPSERT" [1] keyword to identify such kafka sources. I think "upsert" is a well-known terminology widely used in many systems and matches the behavior of how we

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Kurt Young
Good validation messages can't solve the broken user experience, especially that such update mode option will implicitly make half of current kafka options invalid or doesn't make sense. Best, Kurt On Thu, Oct 22, 2020 at 10:31 PM Jark Wu wrote: > Hi Timo, Seth, > > The default value "insertin

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Jark Wu
Hi Timo, Seth, The default value "inserting" of "mode" might be not suitable, because "debezium-json" emits changelog messages which include updates. On Thu, 22 Oct 2020 at 22:10, Seth Wiesman wrote: > +1 for supporting upsert results into Kafka. > > I have no comments on the implementation det

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Seth Wiesman
+1 for supporting upsert results into Kafka. I have no comments on the implementation details. As far as configuration goes, I tend to favor Timo's option where we add a "mode" property to the existing Kafka table with default value "inserting". If the mode is set to "updating" then the validatio

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Timo Walther
Hi Jark, "calling it "kafka-compacted" can even remind users to enable log compaction" But sometimes users like to store a lineage of changes in their topics. Indepent of any ktable/kstream interpretation. I let the majority decide on this topic to not further block this effort. But we mig

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Jark Wu
Hi Timo, Thanks for your opinions. 1) Implementation We will have an stateful operator to generate INSERT and UPDATE_BEFORE. This operator is keyby-ed (primary key as the shuffle key) after the source operator. The implementation of this operator is very similar to the existing `DeduplicateKeepLa

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-22 Thread Timo Walther
Hi Shengkai, Hi Jark, thanks for this great proposal. It is time to finally connect the changelog processor with a compacted Kafka topic. "The operator will produce INSERT rows, or additionally generate UPDATE_BEFORE rows for the previous image, or produce DELETE rows with all columns filled

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-21 Thread Jark Wu
Hi, IMO, if we are going to mix them in one connector, 1) either users need to set some options to a specific value explicitly, e.g. "scan.startup.mode=earliest", "sink.partitioner=hash", etc.. This makes the connector awkward to use. Users may face to fix options one by one according to the excep

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-21 Thread Konstantin Knauf
Hi Kurt, Hi Shengkai, thanks for answering my questions and the additional clarifications. I don't have a strong opinion on whether to extend the "kafka" connector or to introduce a new connector. So, from my perspective feel free to go with a separate connector. If we do introduce a new connector

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-20 Thread Kurt Young
Hi all, I want to describe the discussion process which drove us to have such conclusion, this might make some of the design choices easier to understand and keep everyone on the same page. Back to the motivation, what functionality do we want to provide in the first place? We got a lot of feedba

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-20 Thread Shengkai Fang
Hi devs, As many people are still confused about the difference option behaviours between the Kafka connector and KTable connector, Jark and I list the differences in the doc[1]. Best, Shengkai [1] https://docs.google.com/document/d/13oAWAwQez0lZLsyfV21BfTEze1fc2cz4AZKiNOyBNPk/edit Shengkai Fan

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-19 Thread Shengkai Fang
Hi Konstantin, Thanks for your reply. > It uses the "kafka" connector and does not specify a primary key. The dimensional table `users` is a ktable connector and we can specify the pk on the KTable. > Will it possible to use a "ktable" as a dimensional table in FLIP-132 Yes. We can specify the w

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-19 Thread Konstantin Knauf
Hi Shengkai, Thank you for driving this effort. I believe this a very important feature for many users who use Kafka and Flink SQL together. A few questions and thoughts: * Is your example "Use KTable as a reference/dimension table" correct? It uses the "kafka" connector and does not specify a pr

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-19 Thread Jark Wu
Hi Danny, First of all, we didn't introduce any concepts from KSQL (e.g. Stream vs Table notion). This new connector will produce a changelog stream, so it's still a dynamic table and doesn't conflict with Flink core concepts. The "ktable" is just a connector name, we can also call it "compacted-

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-19 Thread Jark Wu
Hi Jingsong, As the FLIP describes, "KTable connector produces a changelog stream, where each data record represents an update or delete event.". Therefore, a ktable source is an unbounded stream source. Selecting a ktable source is similar to selecting a kafka source with debezium-json format tha

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-19 Thread Danny Chan
The concept seems conflicts with the Flink abstraction “dynamic table”, in Flink we see both “stream” and “table” as a dynamic table, I think we should make clear first how to express stream and table specific features on one “dynamic table”, it is more natural for KSQL because KSQL takes stream

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-19 Thread Jingsong Li
Thanks Shengkai for your proposal. +1 for this feature. > Future Work: Support bounded KTable source I don't think it should be a future work, I think it is one of the important concepts of this FLIP. We need to understand it now. Intuitively, a ktable in my opinion is a bounded table rather th

[DISCUSS] FLIP-149: Introduce the KTable Connector

2020-10-19 Thread Shengkai Fang
Hi, devs. Jark and I want to start a new FLIP to introduce the KTable connector. The KTable is a shortcut of "Kafka Table", it also has the same semantics with the KTable notion in Kafka Stream. FLIP-149: https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+KTable+Connector