Hey Hemanga,

That's quite annoying of MirrorMaker to change the offsets on you. One
solution would be to use the State Processor API[1] to read the savepoint
and update the offsets to the new ones — does MirrorMaker give you these
ahead of time? There might also be more specific tricks people could give
if you're able to share which cloud/ cloud services you're migrating to and
from.

Best,
Austin

[1]:
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/libs/state_processor_api/

On Tue, May 3, 2022 at 5:11 PM Hemanga Borah <borah.hema...@gmail.com>
wrote:

> Any ideas, guys?
>
> On Mon, May 2, 2022 at 6:11 PM Hemanga Borah <borah.hema...@gmail.com>
> wrote:
>
>> Hello,
>>  We are attempting to port our Flink applications from one cloud provider
>> to another.
>>
>>  These Flink applications consume data from Kafka topics and output to
>> various destinations (Kafka or databases). The applications have states
>> stored in them. Some of these stored states are aggregations, for example,
>> at times we store hours (or days) worth of data to aggregate over time.
>> Some other applications have cached information for data enrichment, for
>> example, we store data in Flink state for days, so that we can join them
>> with newly arrived data. The amount of data on the input topics is a lot,
>> and it will be expensive to reprocess the data from the beginning of the
>> topic.
>>
>>  As such, we want to retain the state of the application when we move to
>> a different cloud provider so that we can retain the aggregations and
>> cache, and do not have to start from the beginning of the input topics.
>>
>>  We are replicating the Kafka topics using MirrorMaker 2. This is our
>> procedure:
>>
>>    - Replicate the input topics of each Flink application from source
>>    cloud to destination cloud.
>>    - Take a savepoint of the Flink application on the source cloud
>>    provider.
>>    - Start the Flink application on the destination cloud provider using
>>    the savepoint from the source cloud provider.
>>
>>
>> However, this does not work as we want because there is a difference in
>> offset in the new topics in the new cloud provider (because of MirrorMaker
>> implementation). The offsets of the new topic do not match the ones stored
>> on the Flink savepoint, hence, Flink cannot map to the offsets of the new
>> topic during startup.
>>
>> Has anyone tried to move clouds while retaining the Flink state?
>>
>> Thanks,
>> Hemanga
>>
>

Reply via email to