Hello,
 We are attempting to port our Flink applications from one cloud provider
to another.

 These Flink applications consume data from Kafka topics and output to
various destinations (Kafka or databases). The applications have states
stored in them. Some of these stored states are aggregations, for example,
at times we store hours (or days) worth of data to aggregate over time.
Some other applications have cached information for data enrichment, for
example, we store data in Flink state for days, so that we can join them
with newly arrived data. The amount of data on the input topics is a lot,
and it will be expensive to reprocess the data from the beginning of the
topic.

 As such, we want to retain the state of the application when we move to a
different cloud provider so that we can retain the aggregations and cache,
and do not have to start from the beginning of the input topics.

 We are replicating the Kafka topics using MirrorMaker 2. This is our
procedure:

   - Replicate the input topics of each Flink application from source cloud
   to destination cloud.
   - Take a savepoint of the Flink application on the source cloud provider.
   - Start the Flink application on the destination cloud provider using
   the savepoint from the source cloud provider.


However, this does not work as we want because there is a difference in
offset in the new topics in the new cloud provider (because of MirrorMaker
implementation). The offsets of the new topic do not match the ones stored
on the Flink savepoint, hence, Flink cannot map to the offsets of the new
topic during startup.

Has anyone tried to move clouds while retaining the Flink state?

Thanks,
Hemanga

Reply via email to