Hi Flink Community,

My name is Tony Chen, and I am a software engineer at Robinhood. I have
some questions on restarting a Flink Application from a savepoint or
checkpoint.

We currently store our checkpoints and savepoints in S3, and we would like
to use the Apache Flink Kubernetes Operator to manage our Flink
applications. I know that there is a field called "initialSavepointPath" (
doc
<https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/job-management/#manual-recovery>)
that I can set in my kubernetes manifest so that whenever I want my Flink
application to start from a particular savepoint, it will start from
the savepoint directory in this field. However, if I delete this
FlinkDeployment resource altogether after new savepoints were triggered,
and then redeploy this FlinkDeployment resource, it looks like I have to
manually update the "initialSavepointPath" to a newer savepoint directory
so that the Flink application starts from a newer savepoint.

Is there a way for us to redeploy FlinkDeployment resources so that the
latest checkpoint or savepoint is used, and without having to update the
"initialSavepointPath" field? I noticed in my testing that whenever I
deleted the FlinkDeployment resource and redeploy, it would either start
from the savepoint in initialSavepointPath or from checkpoint 1 if
initialSavepointPath was not set.

For example, let's say I restarted a Flink application at savepoint 10 with
initialSavepointPath set to s3://savepoints/savepoint-10, and then later on
a savepoint 20 was completed and stored at s3://savepoints/savepoint-20. Is
there a way for me to delete this FlinkDeployment and redeploy it without
updating initialSavepointPath?

Thanks,
Tony

P.S. I'm going through the source code more for Apache Flink Kubernetes
Operator to understand how the operator starts a Flink job. Some relevant
code:

   -
   
https://github.com/apache/flink-kubernetes-operator/blob/0c341ebe13645f4e9802cfd780c5b50f59e29363/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/AbstractFlinkService.java#L500
   -
   
https://github.com/apache/flink-kubernetes-operator/blob/0c341ebe13645f4e9802cfd780c5b50f59e29363/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/observer/SavepointObserver.java#L204


-- 

<http://www.robinhood.com/>

Tony Chen

Software Engineer

Menlo Park, CA

Don't copy, share, or use this email without permission. If you received it
by accident, please let us know and then delete it right away.

Reply via email to