Hi all, I was wondering what approaches people usually take with reprocessing data with Flink - specifically the case where you want to upgrade a Flink job, and make it reprocess historical data before continuing to process a live stream.
I'm wondering if we can do something similar to the 'simple rewind' or 'parallel rewind' which Samza uses to solve this problem, discussed here: https://samza.apache.org/learn/documentation/0.10/jobs/reprocessing.html Having used Flink over the past couple of months, the main issue I've had involves Flink's internal state - from my experience it seems it is easy to break the state when upgrading a job, or when changing the parallelism of operators, plus there's no easy way to view/access an internal key-value state from outside Flink. For an example of what I mean, consider a Flink job which consumes a stream of 'updates' to items, and maintains a key-value store of items within Flink's internal state (e.g. in RocksDB). The job also writes the updated items to a Kafka topic: http://oi64.tinypic.com/34q5opf.jpg My worry with this is that the state in RocksDB could be lost or become incompatible with an updated version of the job. If this happens, we need to be able to rebuild Flink's internal key-value store in RocksDB. So I'd like to be able to do something like this (which I believe is the Samza solution): http://oi67.tinypic.com/219ri95.jpg Has anyone done something like this already with Flink? If so are there any examples of how to do this replay & switchover (rebuild state by consuming from a historical log, then switch over to processing the live stream)? Thanks for any insights, Josh