Hi Ivan,
thanks a lot for your message. Can you post the JobManager log here as
well? It might contain additional information on the reason for the timeout.

On Fri, Jul 24, 2020 at 4:03 AM Ivan Yang <ivanygy...@gmail.com> wrote:

> Hello everyone,
>
> We recently upgrade FLINK from 1.9.1 to 1.11.0. Found one strange behavior
> when we stop a job to a save point got following time out error.
> I checked Flink web console, the save point is created in s3 in 1
> second.The job is fairly simple, so 1 second for savepoint generation is
> expected. We use kubernetes deployment. I clocked it, it’s about 60 seconds
> when it returns this error. So afterwards, the job is hanging (it still
> says running, but actually not doing anything). I need run another command
> to cancel it. Anyone has idea what’s going on here? BTW, “flink stop works”
> in 1.19.1 for us before
>
>
>
> flink@flink-jobmanager-fcf5d84c5-sz4wk:~$ flink stop
> 88d9b46f59d131428e2a18c9c7b3aa3f
> Suspending job "88d9b46f59d131428e2a18c9c7b3aa3f" with a savepoint.
>
> ------------------------------------------------------------
>  The program finished with the following exception:
>
> org.apache.flink.util.FlinkException: Could not stop with a savepoint job
> "88d9b46f59d131428e2a18c9c7b3aa3f".
> at
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:495)
> at
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:864)
> at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:487)
> at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:931)
> at
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:992)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:992)
> Caused by: java.util.concurrent.TimeoutException
> at
> java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> at
> org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:493)
> ... 9 more
> flink@flink-jobmanager-fcf5d84c5-sz4wk:~$
>
>
> Thanks in advance,
> Ivan
>

Reply via email to