[jira] [Commented] (YARN-5015) Support sliding window retry capability for container restart

Wangda Tan (JIRA) Wed, 07 Mar 2018 15:24:48 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390428#comment-16390428
 ]


Wangda Tan commented on YARN-5015:
----------------------------------

Thanks [~csingh], see my comments below:

1) Instead of adding getRestartTimes/getRemainingRetries to 
{{ContainerRetryContext}}, I suggest to have a separate class like 
NMContainerRetryContext which includes:
- ContainerRetryContext
- getRestartTimes/getRemainingRetries

Since we should not add runtime information to protocol/api classes.

2) mv org.apache.hadoop.yarn.server.retry.SlidingWindowRetryPolicy to 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container: Why it is 
in server-common? 

3) {{shouldRetry}}:
- It's better to return true at the begining of the method when 
{{getMaxRetries() == ContainerRetryContext.RETRY_FOREVER}}, which can avoid 
lots of checks in the following functions like calculatePendingRetries.

4) {{calculatePendingRetries}}
{code}
      return retryContext.getRemainingRetries() == -1 ?
          retryContext.getMaxRetries() :
          retryContext.getRemainingRetries();
{code} 
Why check {{retryContext.getRemainingRetries() == -1}}? Should this be 
getMaxRetries() == -1? 

5) {{updateRetryContext}}:
{code}
    retryContext.setRemainingRetries(pendingRetries -1);
{code} 

6) In ContainerImpl: 
{code}
          int n = container.containerRetryContext.getMaxRetries()
              - container.containerRetryContext.getRemainingRetries();
          container.addDiagnostics("Diagnostic message from attempt "
              + n + " : ", "\n");
{code} 
Under the context of SlidingWindowRetry, this n may keep changing. To avoid 
introducing more logics, I suggest to remove {{n}} from the diagnostics. 


> Support sliding window retry capability for container restart 
> --------------------------------------------------------------
>
>                 Key: YARN-5015
>                 URL: https://issues.apache.org/jira/browse/YARN-5015
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: oct16-medium
>         Attachments: YARN-5015.01.patch, YARN-5015.02.patch, 
> YARN-5015.03.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in 
> YARN-611). Similar sliding window retry policy is needed for container 
> restarts.
> With this change, we can introduce a common class for 
> SlidingWindowRetryPolicy ( suggested by [~vvasudev] in the comments) and 
> integrate it to container restart. 
> In a subsequent jira, we can modify the AM code to use 
> SlidingWindowRetryPolicy which will unify the AM and container restart code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5015) Support sliding window retry capability for container restart

Reply via email to