Clara Xiong created FLINK-29819:
-----------------------------------

             Summary: Record an error event when savepoint fails within grace 
period
                 Key: FLINK-29819
                 URL: https://issues.apache.org/jira/browse/FLINK-29819
             Project: Flink
          Issue Type: Improvement
          Components: Kubernetes Operator
            Reporter: Clara Xiong


As of now, SavepointObserver retries if savepoint fails within grace period 
until success or failure happens after the grace period. The grace period is 
for each retry.  If underlying problem for quick failure is not transient, such 
as a mis-configured path or a perisistent storage failure, retries keep going 
on without recording any error event. 

We should first add logic to record an error event per failed attempt. We can 
consider capping the retries if it become a pain for users.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to