Hi,

I have a master/reference data that needs to come in through a 
FlinkKafkaConsumer to be broadcast to all nodes and subsequently joined with 
the actual stream for enriching content.

The Kafka consumer gets CDC-type records from database changes. All this works 
well.


My question is how do I initialise the pipeline for the first set of records in 
the database? i.e. those that are not CDC? 

When I deploy for the first time, I would need all the DB records to be sent to 
the FlinkKafkaConsumer before any CDC updates happen.

Is there a hook that allows for the first time initial load of the records in 
the Kafka topic to be broadcast?



Also, about the broadcast state, since we are persisting the state in RocksDB 
backend, I am assuming that the state backend would have the latest records and 
even if the task manager crashes and restarts, we will have the correct Kafka 
consumer group topic offsets recorded so that the next time, we do not 
“startFromEarliest”? Is that right?

Will the state always maintain the updates to the records as well as the Kafka 
topic offsets?


Thanks,
Sandeep 

Reply via email to