Hello, We are attempting to port our Flink applications from one cloud provider to another.
These Flink applications consume data from Kafka topics and output to various destinations (Kafka or databases). The applications have states stored in them. Some of these stored states are aggregations, for example, at times we store hours (or days) worth of data to aggregate over time. Some other applications have cached information for data enrichment, for example, we store data in Flink state for days, so that we can join them with newly arrived data. The amount of data on the input topics is a lot, and it will be expensive to reprocess the data from the beginning of the topic. As such, we want to retain the state of the application when we move to a different cloud provider so that we can retain the aggregations and cache, and do not have to start from the beginning of the input topics. We are replicating the Kafka topics using MirrorMaker 2. This is our procedure: - Replicate the input topics of each Flink application from source cloud to destination cloud. - Take a savepoint of the Flink application on the source cloud provider. - Start the Flink application on the destination cloud provider using the savepoint from the source cloud provider. However, this does not work as we want because there is a difference in offset in the new topics in the new cloud provider (because of MirrorMaker implementation). The offsets of the new topic do not match the ones stored on the Flink savepoint, hence, Flink cannot map to the offsets of the new topic during startup. Has anyone tried to move clouds while retaining the Flink state? Thanks, Hemanga