Beckham007 created YARN-2169: -------------------------------- Summary: NMSimulator of sls should catch more Exception Key: YARN-2169 URL: https://issues.apache.org/jira/browse/YARN-2169 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Beckham007
In the method middleStep() of NMSimulator , sending heart beat may cause InterruptedException or other Exception if the load is heavily. If not handler these exceptions, the task of NMSimulator cloud not add to the executor queue again. So the NM will lost. In my situation, the pool size is 4000, nm size is 2000, and am is 1500. Some NMs will lost. -- This message was sent by Atlassian JIRA (v6.2#6252)