Congxian Qiu(klion26) created FLINK-18748: ---------------------------------------------
Summary: Savepoint would be queued unexpected Key: FLINK-18748 URL: https://issues.apache.org/jira/browse/FLINK-18748 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.11.1, 1.11.0 Reporter: Congxian Qiu(klion26) After FLINK-17342, when triggering a checkpoint/savepoint, we'll check whether the request can be triggered in {{CheckpointRequestDecider#chooseRequestToExecute}}, the logic is as follow: {code:java} Preconditions.checkState(Thread.holdsLock(lock)); // 1. if (isTriggering || queuedRequests.isEmpty()) { return Optional.empty(); } // 2 too many ongoing checkpoitn/savepoint if (pendingCheckpointsSizeSupplier.get() >= maxConcurrentCheckpointAttempts) { return Optional.of(queuedRequests.first()) .filter(CheckpointTriggerRequest::isForce) .map(unused -> queuedRequests.pollFirst()); } // 3 check the timestamp of last complete checkpoint long nextTriggerDelayMillis = nextTriggerDelayMillis(lastCompletionMs); if (nextTriggerDelayMillis > 0) { return onTooEarly(nextTriggerDelayMillis); } return Optional.of(queuedRequests.pollFirst()); {code} But if currently {{pendingCheckpointsSizeSupplier.get()}} < {{maxConcurrentCheckpointAttempts}}, and the request is a savepoint, the savepoint will still wait some time in step 3. I think we should trigger the savepoint immediately if {{pendingCheckpointSizeSupplier.get()}} < {{maxConcurrentCheckpointAttempts}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)