[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784745#comment-13784745
 ] 

Bikas Saha commented on YARN-867:
---------------------------------

Why is this check needed?
{code}
+  private void handleAuxServiceFail(AuxServicesEvent event, Throwable th) {
+    if (event.getType() instanceof AuxServicesEventType) {
+      Container container = event.getContainer();
{code}

If container has already failed then why do we need to change state again? the 
container has already failed.
{code}
+    .addTransition(ContainerState.LOCALIZATION_FAILED, 
ContainerState.EXITED_WITH_FAILURE,
+        ContainerEventType.CONTAINER_EXITED_WITH_FAILURE,
+        new ExitedWithFailureTransition(false))
{code}
{code}
+    .addTransition(ContainerState.CONTAINER_CLEANEDUP_AFTER_KILL,
+        ContainerState.EXITED_WITH_FAILURE,
+        ContainerEventType.CONTAINER_EXITED_WITH_FAILURE,
+        new ExitedWithFailureTransition(false))
{code}

Why is CONTAINER_EXITED_WITH_FAILURE not being handled while container state is 
localized/running?

Why are extra events being ignored in addition to 
ContainerEventType.CONTAINER_EXITED_WITH_FAILURE?
{code}
+        ContainerState.EXITED_WITH_FAILURE,
+        EnumSet.of(
+            ContainerEventType.KILL_CONTAINER,
+            ContainerEventType.CONTAINER_EXITED_WITH_FAILURE,
+            ContainerEventType.RESOURCE_LOCALIZED,
+            ContainerEventType.RESOURCE_FAILED,
+            ContainerEventType.CONTAINER_LAUNCHED,
+            ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS,
+            ContainerEventType.CONTAINER_KILLED_ON_REQUEST))
{code}
{code}
+    .addTransition(ContainerState.DONE, ContainerState.DONE,
+        EnumSet.of(
+            ContainerEventType.RESOURCE_LOCALIZED,
+            ContainerEventType.CONTAINER_LAUNCHED,
+            ContainerEventType.CONTAINER_EXITED_WITH_FAILURE,
+            ContainerEventType.CONTAINER_RESOURCES_CLEANEDUP,
+            ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS,
+            ContainerEventType.CONTAINER_KILLED_ON_REQUEST))
{code}

Can you please check if ExitedWithFailureTransition(true) needs to be called in 
places where the patch is adding ExitedWithFailureTransition(false). Is cleanup 
required?

Do the new tests fail without the changes?

> Isolation of failures in aux services 
> --------------------------------------
>
>                 Key: YARN-867
>                 URL: https://issues.apache.org/jira/browse/YARN-867
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Hitesh Shah
>            Assignee: Xuan Gong
>            Priority: Critical
>         Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
> YARN-867.4.patch, YARN-867.5.patch, YARN-867.sampleCode.2.patch
>
>
> Today, a malicious application can bring down the NM by sending bad data to a 
> service. For example, sending data to the ShuffleService such that it results 
> any non-IOException will cause the NM's async dispatcher to exit as the 
> service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to