------------------ ???????? ------------------
??????:                                                                         
                                               "user-zh"                        
                                                            
<hinobl...@gmail.com&gt;;
????????:&nbsp;2021??3??18??(??????) ????11:47
??????:&nbsp;"user-zh"<user-zh@flink.apache.org&gt;;

????:&nbsp;Re: Flink 1.12.0 ??????????Checkpoint????????



????????????????????????????????????????????????????????

Frost Wong <frostw...@hotmail.com&gt; ??2021??3??18?????? ????10:38??????

&gt; Hi ??????
&gt;
&gt; ??????Flink on yarn????????????????????????????????????????????????
&gt;
&gt; 2021-03-18 08:52:37,019 INFO
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator&nbsp;&nbsp;&nbsp; [] 
- Completed
&gt; checkpoint 661818 for job 4fa72fc414f53e5ee062f9fbd5a2f4d5 (562357 bytes in
&gt; 4699 ms).
&gt; 2021-03-18 08:52:37,637 INFO
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator&nbsp;&nbsp;&nbsp; [] -
&gt; Triggering checkpoint 661819 (type=CHECKPOINT) @ 1616028757520 for job
&gt; 4fa72fc414f53e5ee062f9fbd5a2f4d5.
&gt; 2021-03-18 08:52:42,956 INFO
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator&nbsp;&nbsp;&nbsp; [] 
- Completed
&gt; checkpoint 661819 for job 4fa72fc414f53e5ee062f9fbd5a2f4d5 (2233389 bytes
&gt; in 4939 ms).
&gt; 2021-03-18 08:52:43,528 INFO
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator&nbsp;&nbsp;&nbsp; [] -
&gt; Triggering checkpoint 661820 (type=CHECKPOINT) @ 1616028763457 for job
&gt; 4fa72fc414f53e5ee062f9fbd5a2f4d5.
&gt; 2021-03-18 09:12:43,528 INFO
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator&nbsp;&nbsp;&nbsp; [] -
&gt; Checkpoint 661820 of job 4fa72fc414f53e5ee062f9fbd5a2f4d5 expired before
&gt; completing.
&gt; 2021-03-18 09:12:43,615 INFO
&gt; 
org.apache.flink.runtime.jobmaster.JobMaster&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 [] - Trying to
&gt; recover from a global failure.
&gt; org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable
&gt; failure threshold.
&gt; at
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleCheckpointException(CheckpointFailureManager.java:90)
&gt; ~[flink-dist_2.12-1.12.0.jar:1.12.0]
&gt; at
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:65)
&gt; ~[flink-dist_2.12-1.12.0.jar:1.12.0]
&gt; at
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1760)
&gt; ~[flink-dist_2.12-1.12.0.jar:1.12.0]
&gt; at
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1733)
&gt; ~[flink-dist_2.12-1.12.0.jar:1.12.0]
&gt; at
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:93)
&gt; ~[flink-dist_2.12-1.12.0.jar:1.12.0]
&gt; at
&gt; 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:1870)
&gt; ~[flink-dist_2.12-1.12.0.jar:1.12.0]
&gt; at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
&gt; ~[?:1.8.0_231]
&gt; at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_231]
&gt; at
&gt; 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
&gt; ~[?:1.8.0_231]
&gt; at
&gt; 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
&gt; ~[?:1.8.0_231]
&gt; at
&gt; 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
&gt; ~[?:1.8.0_231]
&gt; at
&gt; 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
&gt; ~[?:1.8.0_231]
&gt; at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_231]
&gt; 2021-03-18 09:12:43,618 INFO
&gt; 
org.apache.flink.runtime.executiongraph.ExecutionGraph&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 [] - Job
&gt; csmonitor_comment_strategy (4fa72fc414f53e5ee062f9fbd5a2f4d5) switched from
&gt; state RUNNING to RESTARTING.
&gt; 2021-03-18 09:12:43,619 INFO
&gt; 
org.apache.flink.runtime.executiongraph.ExecutionGraph&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 [] - Flat Map
&gt; (43/256) (18dec1f23b95f741f5266594621971d5) switched from RUNNING to
&gt; CANCELING.
&gt; 2021-03-18 09:12:43,622 INFO
&gt; 
org.apache.flink.runtime.executiongraph.ExecutionGraph&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 [] - Flat Map
&gt; (44/256) (3f2ec60b2f3042ceea6e1d660c78d3d7) switched from RUNNING to
&gt; CANCELING.
&gt; 2021-03-18 09:12:43,622 INFO
&gt; 
org.apache.flink.runtime.executiongraph.ExecutionGraph&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
 [] - Flat Map
&gt; (45/256) (66d411c2266ab025b69196dfec30d888) switched from RUNNING to
&gt; CANCELING.
&gt; ????????????????????????Unaligned
&gt; 
Checkpoint??rocksdb??????????????????????????????????????????????????Checkpoint??metrics??????????????????????????????????parallelism????????????????
&gt;
&gt; ??????????
&gt;

回复