Hi there,

I have been learning Flink-related technologies recently and currently using 
the combination of Flink CDC + Kafka + Flink. The specific process is as 
follows: Flink CDC collects change data from multiple database tables, then 
forwards it to Kafka, and Flink consumes the data for subsequent calculations.
Now I encounter a problem: the collected multi-table data has strong 
associations, such as main tables and sub-tables. Their changes usually occur 
within a single transaction. My initial idea was to use Kafka's Partition and 
route data through the transactionid of CDC messages, ensuring that one 
consumer thread can process data from one transaction. This can solve ordinary 
scenarios.
However, there is another case: data in a single table may change multiple 
times in a short period, and each change belongs to a different transaction. 
This causes the data to be consumed by multiple consumers, leading to a certain 
probability of order issues. Currently, the only solution I can think of is to 
use a single Partition to process all data, but this is too inefficient.
Therefore, I would like to ask: Is there a flaw in my thinking? Are there any 
better solutions? And at which link should the solution be implemented?


Thanks,
Viktor

Reply via email to