I'm deploying a standalone Flink cluster on top of Kubernetes and using MinIO
as a S3 backend. I mainly follow the instructions in flink's website.
I use the following command to run my job in Flink: $flink run -d -m
<IP>:<port> -j job.jar
I also have added to flink-configmap.yaml the followings:
state.backend: filesystem
state.checkpoints.dir: s3://state/checkpoints
state.savepoints.dir: s3://state/savepoints
s3.path-style-access: true
s3.endpoint: http://minio-service:9000
s3.access-key: *******
s3.secret-key: *******
It seems that everything is working well. The job is submitted correctly,
the checkpoints are written in minio, but when I try to cancel the job or
stop it with savepoints I get the following exception:
org.apache.flink.util.FlinkException: Could not stop with a savepoint job
"5ae191ca2b239ec7771e4c7a9a336537".
at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495)
at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864)
at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992)
at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992)
Caused by: java.util.concurrent.TimeoutException
at
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493)
... 6 more
This is my command to stop with savepoints: $flink stop -p <JobID>
And my Flink version is flink-1.11.2-bin-scala_2.11.
What could be the reason of the exception? Any suggestion?
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/