Re: Does the Kafka source perform retractions on Key?

2021-03-01 Thread Arvid Heise
Hi Rex, yes you can go directly into Flink since 1.11.0 [1], but afaik only through Table API/SQL currently (which you seem to be using anyways most of the time). I'd recommend using 1.11.1+ (some bugfixes) or even 1.12.0+ (many new useful features [2]). You can also check the main doc [3]. If

Re: Does the Kafka source perform retractions on Key?

2021-03-01 Thread Rex Fenley
Thanks Arvid, I think my confusion lies in misinterpreting the meaning of CDC. We basically don't want CDC, we just use it to get data into a compacted Kafka topic where we hold the current state of the world to consume from multiple consumers. You have described pretty thoroughly where we want

Re: Does the Kafka source perform retractions on Key?

2021-03-01 Thread Arvid Heise
> We are rereading the topics, at any time we might want a completely different materialized view for a different web service for some new application feature. Other jobs / new jobs need to read all the up-to-date rows from the databases. > I still don't see how this is the case if everything just

Re: Does the Kafka source perform retractions on Key?

2021-02-27 Thread Rex Fenley
Hi Arvid, >If you are not rereading the topics, why do you compact them? We are rereading the topics, at any time we might want a completely different materialized view for a different web service for some new application feature. Other jobs / new jobs need to read all the up-to-date rows from

Re: Does the Kafka source perform retractions on Key?

2021-02-27 Thread Arvid Heise
Hi Rex, Your initial question was about the impact of compaction on your CDC application logic. I have been (unsuccessfully) trying to tell you that you do not need compaction and it's counterproductive. If you are not rereading the topics, why do you compact them? It's lost compute time and I/O

Re: Does the Kafka source perform retractions on Key?

2021-02-27 Thread Rex Fenley
Hi Arvid, I really appreciate the thorough response but I don't think this contradicts our use case. In servicing web applications we're doing nothing more than taking data from giant databases we use, and performing joins and denormalizing aggs strictly for performance reasons (joining across a

Re: Does the Kafka source perform retractions on Key?

2021-02-27 Thread Arvid Heise
Hi Rex, imho log compaction and CDC for historic processes are incompatible on conceptual level. Let's take this example: topic: party membership +(1, Dem, 2000) -(1, Dem, 2009) +(1, Gop, 2009) Where 1 is the id of a real person. Now, let's consider you want to count memberships retroactively

Re: Does the Kafka source perform retractions on Key?

2021-02-26 Thread Rex Fenley
Digging around, it looks like Upsert Kafka which requires a Primary Key will actually do what I want and uses compaction, but it doesn't look compatible with Debezium format? Is this on the roadmap? In the meantime, we're considering consuming from Debezium Kafka (still compacted) and then

Re: Does the Kafka source perform retractions on Key?

2021-02-26 Thread Rex Fenley
Does this also imply that it's not safe to compact the initial topic where data is coming from Debezium? I'd think that Flink's Kafka source would emit retractions on any existing data with a primary key as new data with the same pk arrived (in our case all data has primary keys). I guess that

Re: Does the Kafka source perform retractions on Key?

2021-02-26 Thread Arvid Heise
Just to clarify, intermediate topics should in most cases not be compacted for exactly the reasons if your application depends on all intermediate data. For the final topic, it makes sense. If you also consume intermediate topics for web application, one solution is to split it into two topics

Re: Does the Kafka source perform retractions on Key?

2021-02-24 Thread Rex Fenley
All of our Flink jobs are (currently) used for web applications at the end of the day. We see a lot of latency spikes from record amplification and we were at first hoping we could pass intermediate results through Kafka and compact them to lower the record amplification, but then it hit me that

Re: Does the Kafka source perform retractions on Key?

2021-02-24 Thread Arvid Heise
Jan's response is correct, but I'd like to emphasize the impact on a Flink application. If the compaction happens before the data arrives in Flink, the intermediate updates are lost and just the final result appears. Also if you restart your Flink application and reprocess older data, it will

Re: Does the Kafka source perform retractions on Key?

2021-02-24 Thread Jan Lukavský
Hi Rex, If I understand correctly, you are concerned about behavior of Kafka source in the case of compacted topic, right? If that is the case, then this is not directly related to Flink, Flink will expose the behavior defined by Kafka. You can read about it for instance here [1]. TL;TD -

Re: Does the Kafka source perform retractions on Key?

2021-02-23 Thread Rex Fenley
Apologies, forgot to finish. If the Kafka source performs its own retractions of old data on key (user_id) for every append it receives, it should resolve this discrepancy I think. Again, is this true? Anything else I'm missing? Thanks! On Tue, Feb 23, 2021 at 6:12 PM Rex Fenley wrote: > Hi,

Does the Kafka source perform retractions on Key?

2021-02-23 Thread Rex Fenley
Hi, I'm concerned about the impacts of Kafka's compactions when sending data between running flink jobs. For example, one job produces retract stream records in sequence of (false, (user_id: 1, state: "california") -- retract (true, (user_id: 1, state: "ohio")) -- append Which is consumed by