[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784717#comment-13784717
 ] 

Xuan Gong commented on YARN-867:
--------------------------------

bq.Probably have 1 try catch instead of multiple.

Fixed. Use only one big try catch block

bq.Can we rename AUXSERVICE_FAIL to AUXSERVICE_ERROR since the service probably 
hasnt failed.

Done

bq.TestAuxService needs an addition for the new code

Added a new test case in TestAuxService

bq.TestContainer - new test can be made simpler by not mocking 
AuxServiceHandler and instead sending the failed event directly like its done 
for other tests there.

Fixed

bq.In AuxService.handle(APPLICATION_INIT) and other places like that, where the 
service does not exist then we should fail too.

Done

bq.Probably we can ignore the error here since the container has already failed.

I think we still need this transition. The container can go to 
ContainerState.LOCALIZATION_FAILED from new state, and AuxService is triggered 
to do the Application_init at that time. If there is any exception, we will 
send the ContainerExitEvent with 
ContainerEventType.CONTAINER_EXITED_WITH_FAILURE to the Container. And It is 
very possible that container will start to process this event when it is in the 
LOCALIZATION_FAILED state. So, we should handle it.

> Isolation of failures in aux services 
> --------------------------------------
>
>                 Key: YARN-867
>                 URL: https://issues.apache.org/jira/browse/YARN-867
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Hitesh Shah
>            Assignee: Xuan Gong
>            Priority: Critical
>         Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
> YARN-867.4.patch, YARN-867.sampleCode.2.patch
>
>
> Today, a malicious application can bring down the NM by sending bad data to a 
> service. For example, sending data to the ShuffleService such that it results 
> any non-IOException will cause the NM's async dispatcher to exit as the 
> service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to