Stop-with-savepoint is an async operation, it will trigger savepoint at first, 
and trigger this operation in a cache. 

These error logs indicate that the operation is in the cache which can no 
longer be found. Could you provide more jobmanager.log to find when this 
operation is evicted from cache.




> 2022年4月20日 上午12:21,Harsh Shah <harsh.a.s...@shopify.com> 写道:
> 
> Hello Community,
> 
> We are trying to adopt flink-stop instead of orchestrating the stop manually. 
> While using the command, it failed with error "Operation not found under key: 
> org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@9fbfc15a".
>  the job eventually stopped but I wanted to see how I could execute the 
> command successfully ! Any help or direction to debug is appreciated. 
> 
> Information:
> * Flink version: 1.13.1 
> * Flink on kubernetes with HA enabled. Standalone cluster.
> 
> command: "/bin/flink stop $JOBID"
> Stack trace of the command:
> WARNING: Illegal reflective access by 
> org.apache.hadoop.security.authentication.util.KerberosUtil 
> (file:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar) to method 
> sun.security.krb5.Config.getInstance()
> WARNING: Please consider reporting this to the maintainers of 
> org.apache.hadoop.security.authentication.util.KerberosUtil
> WARNING: Use --illegal-access=warn to enable warnings of further illegal 
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> Suspending job "00000000000000000000000000000000" with a savepoint.
> 
> ------------------------------------------------------------
>  The program finished with the following exception:
> 
> org.apache.flink.util.FlinkException: Could not stop with a savepoint job 
> "00000000000000000000000000000000".
> at org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:581)
> at 
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
> at org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:569)
> at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1069)
> at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
> at java.base/java.security.AccessController.doPrivileged(Native Method)
> at java.base/javax.security.auth.Subject.doAs(Unknown Source)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.flink.runtime.rest.util.RestClientException: 
> [org.apache.flink.runtime.rest.NotFoundException: Operation not found under 
> key: 
> org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@9fbfc15a
> at 
> org.apache.flink.runtime.rest.handler.async.AbstractAsynchronousOperationHandlers$StatusHandler.handleRequest(AbstractAsynchronousOperationHandlers.java:182)
> at 
> org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers$SavepointStatusHandler.handleRequest(SavepointHandlers.java:219)
> at 
> org.apache.flink.runtime.rest.handler.AbstractRestHandler.respondToRequest(AbstractRestHandler.java:83)
> at 
> org.apache.flink.runtime.rest.handler.AbstractHandler.respondAsLeader(AbstractHandler.java:195)
> at 
> org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.lambda$channelRead0$0(LeaderRetrievalHandler.java:83)
> at java.base/java.util.Optional.ifPresent(Unknown Source)
> at org.apache.flink.util.OptionalConsumer.ifPresent(OptionalConsumer.java:45)
> at 
> org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:80)
> at 
> org.apache.flink.runtime.rest.handler.LeaderRetrievalHandler.channelRead0(LeaderRetrievalHandler.java:49)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
> at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> 
> 
> Jobs manager logs:
> {"instant":{"epochSecond":1650384228,"nanoOfSecond":866181000},"thread":"flink-akka.actor.default-dispatcher-2","level":"INFO","loggerName":"org.apache.flink.runtime.jobmaster.JobMaster","message":"Triggering
>  stop-with-savepoint for job 
> 00000000000000000000000000000000.","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":28,"threadPriority":5}
> {"instant":{"epochSecond":1650384229,"nanoOfSecond":30779000},"thread":"flink-rest-server-netty-worker-thread-1","level":"ERROR","loggerName":"org.apache.flink.runtime.rest.handler.job.savepoints.SavepointHandlers$SavepointStatusHandler","message":"Exception
>  occurred in REST handler: Operation not found under key: 
> org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@9fbfc15a","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":95,"threadPriority":5}
> ... # rest of things are success

Reply via email to