[ 
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944384#comment-14944384
 ] 

Neelesh Srinivas Salian commented on YARN-4185:
-----------------------------------------------

[~adhoot], thanks for the clarification.
So, the initial retries can be done with backoff times of 1,2,4,8 that is still 
less then 10 and thus give the opportunity to retry for a short-lived NM 
restart (under 10 seconds)
We can continue to wait 10 seconds of backoff incrementally to accomodate a 
larger failure time.

Thus, the failure times can be under 1,2,4,8,10,10 and so on till the number of 
retries is exhausted.
My only concern is that if the failure lasts longer than the total wait time 
and the number of retries, there won't be a chance to retry.

I'll write up a patch to exhibit this.
Thank you.

> Retry interval delay for NM client can be improved from the fixed static 
> retry 
> -------------------------------------------------------------------------------
>
>                 Key: YARN-4185
>                 URL: https://issues.apache.org/jira/browse/YARN-4185
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Anubhav Dhoot
>            Assignee: Neelesh Srinivas Salian
>
> Instead of having a fixed retry interval that starts off very high and stays 
> there, we are better off using an exponential backoff that has the same fixed 
> max limit. Today the retry interval is fixed at 10 sec that can be 
> unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to