[ https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390428#comment-16390428 ]
Wangda Tan commented on YARN-5015: ---------------------------------- Thanks [~csingh], see my comments below: 1) Instead of adding getRestartTimes/getRemainingRetries to {{ContainerRetryContext}}, I suggest to have a separate class like NMContainerRetryContext which includes: - ContainerRetryContext - getRestartTimes/getRemainingRetries Since we should not add runtime information to protocol/api classes. 2) mv org.apache.hadoop.yarn.server.retry.SlidingWindowRetryPolicy to org.apache.hadoop.yarn.server.nodemanager.containermanager.container: Why it is in server-common? 3) {{shouldRetry}}: - It's better to return true at the begining of the method when {{getMaxRetries() == ContainerRetryContext.RETRY_FOREVER}}, which can avoid lots of checks in the following functions like calculatePendingRetries. 4) {{calculatePendingRetries}} {code} return retryContext.getRemainingRetries() == -1 ? retryContext.getMaxRetries() : retryContext.getRemainingRetries(); {code} Why check {{retryContext.getRemainingRetries() == -1}}? Should this be getMaxRetries() == -1? 5) {{updateRetryContext}}: {code} retryContext.setRemainingRetries(pendingRetries -1); {code} 6) In ContainerImpl: {code} int n = container.containerRetryContext.getMaxRetries() - container.containerRetryContext.getRemainingRetries(); container.addDiagnostics("Diagnostic message from attempt " + n + " : ", "\n"); {code} Under the context of SlidingWindowRetry, this n may keep changing. To avoid introducing more logics, I suggest to remove {{n}} from the diagnostics. > Support sliding window retry capability for container restart > -------------------------------------------------------------- > > Key: YARN-5015 > URL: https://issues.apache.org/jira/browse/YARN-5015 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Reporter: Varun Vasudev > Assignee: Chandni Singh > Priority: Major > Labels: oct16-medium > Attachments: YARN-5015.01.patch, YARN-5015.02.patch, > YARN-5015.03.patch > > > We support sliding window retry policy for AM restarts (Introduced in > YARN-611). Similar sliding window retry policy is needed for container > restarts. > With this change, we can introduce a common class for > SlidingWindowRetryPolicy ( suggested by [~vvasudev] in the comments) and > integrate it to container restart. > In a subsequent jira, we can modify the AM code to use > SlidingWindowRetryPolicy which will unify the AM and container restart code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org