Hey Viktor Maybe this PR[1] would help your case.
Best, Leonard [1] https://github.com/apache/flink-cdc/pull/4170 > 2026 2月 5 10:42,viktor <[email protected]> 写道: > > Hi there, > > I have been learning Flink-related technologies recently and currently using > the combination of Flink CDC + Kafka + Flink. The specific process is as > follows: Flink CDC collects change data from multiple database tables, then > forwards it to Kafka, and Flink consumes the data for subsequent calculations. > Now I encounter a problem: the collected multi-table data has strong > associations, such as main tables and sub-tables. Their changes usually occur > within a single transaction. My initial idea was to use Kafka's Partition and > route data through the transactionid of CDC messages, ensuring that one > consumer thread can process data from one transaction. This can solve > ordinary scenarios. > However, there is another case: data in a single table may change multiple > times in a short period, and each change belongs to a different transaction. > This causes the data to be consumed by multiple consumers, leading to a > certain probability of order issues. Currently, the only solution I can think > of is to use a single Partition to process all data, but this is too > inefficient. > Therefore, I would like to ask: Is there a flaw in my thinking? Are there any > better solutions? And at which link should the solution be implemented? > > Thanks, > Viktor
