zhangjing created FLINK-4356: -------------------------------- Summary: JobMaster HA Key: FLINK-4356 URL: https://issues.apache.org/jira/browse/FLINK-4356 Project: Flink Issue Type: Sub-task Reporter: zhangjing
1. for standalone mode, LocalDispatcher watch JobMaster LocalDispatcher detect the failure of JobMaster, recover jobGraph and Libraries from persistent storage, spawn a new JobManager new JobMaster compete for leadership, save address to zookeeper storage new JobMaster registers at ResourceManager new JobMaster recover Execution of its job (execution graph) from latest completed checkpoint 2. for yarn mode, YarnApplicationMasterRunner create a ProcessReaper of JobMaster ProcessReaper monitor JobMaster, kill JVM upon JobMaster termination Yarn will create a new AppMaster which contains a new JobManager, JobGraph and Libraries are retrieved as startup artifacts new JobMaster compete for leadership, save address to zookeeper storage new JobMaster registers at ResourceManager new JobMaster recover Execution of its job (execution graph) from latest completed checkpoint -- This message was sent by Atlassian JIRA (v6.3.4#6332)