[ 
https://issues.apache.org/jira/browse/YARN-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8362:
--------------------------------
    Attachment: YARN-8362.001.patch

> Number of remaining retries are updated twice after a container failure in NM 
> ------------------------------------------------------------------------------
>
>                 Key: YARN-8362
>                 URL: https://issues.apache.org/jira/browse/YARN-8362
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Critical
>             Fix For: 3.2.0, 3.1.1
>
>         Attachments: YARN-8362.001.patch
>
>
> The {{shouldRetry(int errorCode)}} in {{ContainerImpl}} with YARN-5015 also 
> updated some fields in retry context- remaining retries, restart times.
> This method is directly called from outside the ContainerImpl class as well- 
> {{ContainerLaunch.setContainerCompletedStatus}}. This causes following 
> problems:
>  # remainingRetries are updated more than once after a failure. if 
> {{maxRetries = 1}}, then a retry will not be triggered because of multiple 
> calls to {{shouldRetry(int errorCode).}}
>  # Writes to {{retryContext}} should be protected and called when the write 
> lock is held.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to