Yes, that would work. But it might be still interesting to understand why you ran into the timeout. Was it just a big state that just took longer than expected? Or some network issue? ...that's just for you to understand the underlying issue in a better way. But I'm glad the savepoint creation was successful in the end.
Best, Matthias On Fri, May 28, 2021 at 2:35 PM Robert Cullen <cinquate...@gmail.com> wrote: > Hi Matthias, You are correct. After a few minutes I took another look at > my savepoint folder and the data was there. I think increasing the timeout > may resolve the problem? > > On Fri, May 28, 2021 at 8:21 AM Matthias Pohl <matth...@ververica.com> > wrote: > >> Hi Robert, >> it would be interesting to see the corresponding taskmanager/jobmanager >> logs. That would help in finding out why the savepoint creation failed. >> Just to verify: The savepoint data wasn't written to S3 even after the >> timeout happened, was it? >> >> Best, >> Matthias >> >> On Thu, May 27, 2021 at 7:50 PM Robert Cullen <cinquate...@gmail.com> >> wrote: >> >>> I triggered a savepoint from a currently running job. Although the >>> directory structure gets created in the MINIO S3 store, the command >>> ultimately fails without writing the data. >>> >>> root@flink-client:/opt/flink# ./bin/flink list --target kubernetes-session >>> -Dkubernetes.cluster-id=flink-jobmanager -Dkubernetes.namespace=cmdaa >>> 2021-05-27 17:37:00,409 INFO >>> org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Retrieve >>> flink cluster flink-jobmanager successfully, JobManager Web Interface: >>> http://flink-jobmanager-rest.cmdaa:8081 >>> Waiting for response... >>> ------------------ Running/Restarting Jobs ------------------- >>> 27.05.2021 16:50:00 : 72f614340dc1a7416d0613362d1ef83b : Streaming Log >>> Count (RUNNING) >>> -------------------------------------------------------------- >>> No scheduled jobs. >>> root@flink-client:/opt/flink# ./bin/flink savepoint >>> 72f614340dc1a7416d0613362d1ef83b --target kubernetes-session >>> -Dkubernetes.cluster-id=flink-jobmanager -Dkubernetes.namespace=cmdaa >>> 2021-05-27 17:37:58,776 INFO >>> org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Retrieve >>> flink cluster flink-jobmanager successfully, JobManager Web Interface: >>> http://flink-jobmanager-rest.cmdaa:8081 >>> Triggering savepoint for job 72f614340dc1a7416d0613362d1ef83b. >>> Waiting for response... >>> >>> ------------------------------------------------------------ >>> The program finished with the following exception: >>> >>> org.apache.flink.util.FlinkException: Triggering a savepoint for the job >>> 72f614340dc1a7416d0613362d1ef83b failed. >>> at >>> org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777) >>> at >>> org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754) >>> at >>> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002) >>> at >>> org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751) >>> at >>> org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072) >>> at >>> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) >>> at >>> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) >>> at >>> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) >>> Caused by: java.util.concurrent.TimeoutException >>> at >>> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784) >>> at >>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) >>> at >>> org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:771) >>> ... 7 more >>> root@flink-client:/opt/flink# >>> >>> -- >>> Robert Cullen >>> 240-475-4490 >>> >> > > -- > Robert Cullen > 240-475-4490 >