[ https://issues.apache.org/jira/browse/YARN-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sunil G updated YARN-4113: -------------------------- Attachment: 0001-YARN-4113.patch As HADOOP-12386 is committed, changing {{RetryProxy}} and {{ServerProy}} to use {{retryForeverWithFixedSleep}} policy instead of RETRY_FOREVER. > RM should respect retry-interval when uses RetryPolicies.RETRY_FOREVER > ---------------------------------------------------------------------- > > Key: YARN-4113 > URL: https://issues.apache.org/jira/browse/YARN-4113 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Wangda Tan > Assignee: Sunil G > Priority: Critical > Attachments: 0001-YARN-4113.patch > > > Found one issue in RMProxy how to initialize RetryPolicy: In > RMProxy#createRetryPolicy. When rmConnectWaitMS is set to -1 (wait forever), > it uses RetryPolicies.RETRY_FOREVER which doesn't respect > {{yarn.resourcemanager.connect.retry-interval.ms}} setting. > RetryPolicies.RETRY_FOREVER uses 0 as the interval, when I run the test > without properly setup localhost name: > {{TestYarnClient#testShouldNotRetryForeverForNonNetworkExceptions}}, it wrote > 14G DEBUG exception message to system before it dies. This will be very bad > if we do the same thing in a production cluster. > We should fix two places: > - Make RETRY_FOREVER can take retry-interval as constructor parameter. > - Respect retry-interval when we uses RETRY_FOREVER policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)