/flink/checkpoints  is a external persistent store (a nas directory 
mounts to the job manager)








------------------ ???????? ------------------
??????:&nbsp;"Vijay Bhaskar"<bhaskar.eba...@gmail.com&gt;;
????????:&nbsp;2019??11??28??(??????) ????2:29
??????:&nbsp;"??????"<xcz200...@qq.com&gt;;
????:&nbsp;"user"<user@flink.apache.org&gt;;
????:&nbsp;Re: JobGraphs not cleaned up in HA mode



Following are the mandatory condition to run in HA:

a) You should have persistent common external store for jobmanager and task 
managers to while writing the state
b) You should have persistent external store for zookeeper to store the 
Jobgraph.


Zookeeper is referring&nbsp; path: 
/flink/checkpoints/submittedJobGraph480ddf9572ed&nbsp; to get the job graph but 
jobmanager unable to find it.
It seems /flink/checkpoints&nbsp; is not the external persistent store




Regards
Bhaskar


On Thu, Nov 28, 2019 at 10:43 AM seuzxc <xcz200...@qq.com&gt; wrote:

hi ??I've the same problem with flink 1.9.1 , any solution to fix it
 when the k8s redoploy jobmanager ,&nbsp; the error looks like (seems zk not
 remove submitted job info, but jobmanager remove the file):&nbsp; 
 
 
 Caused by: org.apache.flink.util.FlinkException: Could not retrieve
 submitted JobGraph from state handle under
 /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state
 handle is broken. Try cleaning the state handle store.
 &nbsp; &nbsp; &nbsp; &nbsp; at
 
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)
 &nbsp; &nbsp; &nbsp; &nbsp; at
 org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696)
 &nbsp; &nbsp; &nbsp; &nbsp; at
 
org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681)
 &nbsp; &nbsp; &nbsp; &nbsp; at
 org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662)
 &nbsp; &nbsp; &nbsp; &nbsp; at
 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821)
 &nbsp; &nbsp; &nbsp; &nbsp; at
 
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72)
 &nbsp; &nbsp; &nbsp; &nbsp; ... 9 more
 Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain
 block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494
 file=/flink/checkpoints/submittedJobGraph480ddf9572ed
 &nbsp; &nbsp; &nbsp; &nbsp; at
 
org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052)
 
 
 
 --
 Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to