Hi, I have a master/reference data that needs to come in through a FlinkKafkaConsumer to be broadcast to all nodes and subsequently joined with the actual stream for enriching content.
The Kafka consumer gets CDC-type records from database changes. All this works well. My question is how do I initialise the pipeline for the first set of records in the database? i.e. those that are not CDC? When I deploy for the first time, I would need all the DB records to be sent to the FlinkKafkaConsumer before any CDC updates happen. Is there a hook that allows for the first time initial load of the records in the Kafka topic to be broadcast? Also, about the broadcast state, since we are persisting the state in RocksDB backend, I am assuming that the state backend would have the latest records and even if the task manager crashes and restarts, we will have the correct Kafka consumer group topic offsets recorded so that the next time, we do not “startFromEarliest”? Is that right? Will the state always maintain the updates to the records as well as the Kafka topic offsets? Thanks, Sandeep