[ https://issues.apache.org/jira/browse/APEXCORE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vlad Rozov resolved APEXCORE-426. --------------------------------- Resolution: Fixed Fix Version/s: 3.6.0 > Support work preserving AM recovery > ----------------------------------- > > Key: APEXCORE-426 > URL: https://issues.apache.org/jira/browse/APEXCORE-426 > Project: Apache Apex Core > Issue Type: Improvement > Reporter: Thomas Weise > Assignee: Sandesh > Labels: apex-hadoop-version > Fix For: 3.6.0 > > > On app master failure, the streaming containers should continue running. > As of 2.2, YARN will automatically terminate all containers and the > replacement app master will relaunch them. Once we move to a newer minimum > Hadoop version, we should leverage work preserving restart. > The mechanism in Apex containers to locate the new master process are already > in place. > > Test Cases: > 1. Kill the app-master - only app-master container id should change, all the > other containers id should remain same. > 2. Kill the app-master and few other containers, make sure that killed > containers are recovered. -- This message was sent by Atlassian JIRA (v6.3.15#6346)