[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10341:
-----------------------------
    Description: 
If there 10 workers running and if containers get killed , after a while we see 
that there are just 9 workers runnning. This is due to CONTAINER COMPLETED 
Event is not processed on AM side. 
 Issue is in below code:
{code:java}
public void onContainersCompleted(List<ContainerStatus> statuses) {
      for (ContainerStatus status : statuses) {
        ContainerId containerId = status.getContainerId();
        ComponentInstance instance = liveInstances.get(status.getContainerId());
        if (instance == null) {
          LOG.warn(
              "Container {} Completed. No component instance exists. 
exitStatus={}. diagnostics={} ",
              containerId, status.getExitStatus(), status.getDiagnostics());
          return;
        }
        ComponentEvent event =
            new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
                .setStatus(status).setInstance(instance)
                .setContainerId(containerId);
        dispatcher.getEventHandler().handle(event);
      }
{code}
If component instance doesnt exist for a container, it doesnt iterate over 
other containers as its returning from method. This happens when restart_policy 
is "ON_FAILURE"

  was:
If there 10 workers running and if containers get killed , after a while we see 
that there are just 9 workers runnning. This is due to CONTAINER COMPLETED 
Event is not processed on AM side. 
Issue is in below code:

{code:java}
public void onContainersCompleted(List<ContainerStatus> statuses) {
      for (ContainerStatus status : statuses) {
        ContainerId containerId = status.getContainerId();
        ComponentInstance instance = liveInstances.get(status.getContainerId());
        if (instance == null) {
          LOG.warn(
              "Container {} Completed. No component instance exists. 
exitStatus={}. diagnostics={} ",
              containerId, status.getExitStatus(), status.getDiagnostics());
          return;
        }
        ComponentEvent event =
            new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
                .setStatus(status).setInstance(instance)
                .setContainerId(containerId);
        dispatcher.getEventHandler().handle(event);
      }
{code}

If component instance doesnt exist for a container, it doesnt iterate over 
other containers as its returning from method



> Yarn Service Container Completed event doesn't get processed 
> -------------------------------------------------------------
>
>                 Key: YARN-10341
>                 URL: https://issues.apache.org/jira/browse/YARN-10341
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bilwa S T
>            Assignee: Bilwa S T
>            Priority: Critical
>             Fix For: 3.4.0, 3.3.1
>
>         Attachments: YARN-10341.001.patch, YARN-10341.002.patch, 
> YARN-10341.003.patch, YARN-10341.004.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
>  Issue is in below code:
> {code:java}
> public void onContainersCompleted(List<ContainerStatus> statuses) {
>       for (ContainerStatus status : statuses) {
>         ContainerId containerId = status.getContainerId();
>         ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
>         if (instance == null) {
>           LOG.warn(
>               "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>               containerId, status.getExitStatus(), status.getDiagnostics());
>           return;
>         }
>         ComponentEvent event =
>             new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
>                 .setStatus(status).setInstance(instance)
>                 .setContainerId(containerId);
>         dispatcher.getEventHandler().handle(event);
>       }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method. This happens when 
> restart_policy is "ON_FAILURE"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to