Batch模式下，StatefulSinkWriter如何存储状态，以保证在failover或者job restart的情况避免从头读取数据

jinzhuguang Fri, 02 Feb 2024 00:48:11 -0800

Flink 1.16.0

我在阅读FileSink的代码时发现，其依靠StatefulSinkWriter的snapshotState接口在checkpoint时存储当前的状态。


interface StatefulSinkWriter<InputT, WriterStateT> extends SinkWriter<InputT> {
        /**
         * @return The writer's state.
         * @throws IOException if fail to snapshot writer's state.
         */
        List<WriterStateT> snapshotState(long checkpointId) throws IOException;
    }

然而，我了解到Flink在batch模式不会开启checkpoint机制，那我如何能够保证批任务的状态能够得到及时保存。

如果我在进行大规模数据的ETL操作，因为某些task失败导致任务重试，难道整个任务都要从头开始吗？

恳请各位大佬赐教

Batch模式下，StatefulSinkWriter如何存储状态，以保证在failover或者job restart的情况避免从头读取数据

回复