Till Rohrmann created FLINK-5960: ------------------------------------ Summary: Make CheckpointCoordinator less blocking Key: FLINK-5960 URL: https://issues.apache.org/jira/browse/FLINK-5960 Project: Flink Issue Type: Improvement Components: State Backends, Checkpointing Affects Versions: 1.2.0, 1.3.0 Reporter: Till Rohrmann
Currently the {{CheckpointCoordinator}} locks its operation under a global lock. This also includes writing checkpoint data out to a state storage. If this operation blocks, then the whole checkpoint operator stands still. I think we should rework the {{CheckpointCoordinator}} to make fewer assumptions about external systems to tolerate write failures and timeouts. Furthermore, we should try to limit the scope of locking and the execution of potentially blocking operation under the lock. This will improve the runtime behaviour of the {{CheckpointCoordinator}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)