Hey Viktor

Maybe this PR[1] would help your case.

Best,
Leonard
[1] https://github.com/apache/flink-cdc/pull/4170 

> 2026 2月 5 10:42,viktor <[email protected]> 写道:
> 
> Hi there,
> 
> I have been learning Flink-related technologies recently and currently using 
> the combination of Flink CDC + Kafka + Flink. The specific process is as 
> follows: Flink CDC collects change data from multiple database tables, then 
> forwards it to Kafka, and Flink consumes the data for subsequent calculations.
> Now I encounter a problem: the collected multi-table data has strong 
> associations, such as main tables and sub-tables. Their changes usually occur 
> within a single transaction. My initial idea was to use Kafka's Partition and 
> route data through the transactionid of CDC messages, ensuring that one 
> consumer thread can process data from one transaction. This can solve 
> ordinary scenarios.
> However, there is another case: data in a single table may change multiple 
> times in a short period, and each change belongs to a different transaction. 
> This causes the data to be consumed by multiple consumers, leading to a 
> certain probability of order issues. Currently, the only solution I can think 
> of is to use a single Partition to process all data, but this is too 
> inefficient.
> Therefore, I would like to ask: Is there a flaw in my thinking? Are there any 
> better solutions? And at which link should the solution be implemented?
> 
> Thanks,
> Viktor

Reply via email to