[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784717#comment-13784717 ]
Xuan Gong commented on YARN-867: -------------------------------- bq.Probably have 1 try catch instead of multiple. Fixed. Use only one big try catch block bq.Can we rename AUXSERVICE_FAIL to AUXSERVICE_ERROR since the service probably hasnt failed. Done bq.TestAuxService needs an addition for the new code Added a new test case in TestAuxService bq.TestContainer - new test can be made simpler by not mocking AuxServiceHandler and instead sending the failed event directly like its done for other tests there. Fixed bq.In AuxService.handle(APPLICATION_INIT) and other places like that, where the service does not exist then we should fail too. Done bq.Probably we can ignore the error here since the container has already failed. I think we still need this transition. The container can go to ContainerState.LOCALIZATION_FAILED from new state, and AuxService is triggered to do the Application_init at that time. If there is any exception, we will send the ContainerExitEvent with ContainerEventType.CONTAINER_EXITED_WITH_FAILURE to the Container. And It is very possible that container will start to process this event when it is in the LOCALIZATION_FAILED state. So, we should handle it. > Isolation of failures in aux services > -------------------------------------- > > Key: YARN-867 > URL: https://issues.apache.org/jira/browse/YARN-867 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Hitesh Shah > Assignee: Xuan Gong > Priority: Critical > Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, > YARN-867.4.patch, YARN-867.sampleCode.2.patch > > > Today, a malicious application can bring down the NM by sending bad data to a > service. For example, sending data to the ShuffleService such that it results > any non-IOException will cause the NM's async dispatcher to exit as the > service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.1#6144)