I don't think option 2 where you restart from 1 makes sense. Its also not a
goal to minimize the total wait time. He goal should minimize the time to
recover for short intermittent failure while also waiting  long enough for
long failures before giving up.
On Oct 3, 2015 6:43 PM, "Neelesh Srinivas Salian (JIRA)" <j...@apache.org>
wrote:

>
>     [
> https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942528#comment-14942528
> ]
>
> Neelesh Srinivas Salian commented on YARN-4185:
> -----------------------------------------------
>
> Thoughts:
> 1) Using the exponentialBackoffRetry policy will have a progression of
> wait time starting at 1sec per retry assuming it takes a second for the NM
> to come up.
> Hence exponentially, the backoff time increases 2,4,8,16...till 512 as we
> approach 10 retries.
>
> 2) In the current strategy, the wait time is 10 seconds which causes an NM
> that restarted in 1 second to wait for a retry.
>
> 3) In the event of the retries going forward, at the 3rd retry ( the wait
> time is collectively 7 seconds (1+2+4) as per the exponential strategy) and
> (30 (10+10+10) seconds as the current static retry)
>
> 4) If you keep retrying, collectively the waiting static retry has now
> waited for 60 seconds versus 2^6 = 64 seconds in the exponential strategy
> at the 6th retry attempt.
>
> Logic for the Design:
> 1) In the event of retries being default to 10,
>    a. I propose after the 3rd attempt, we continue to keep the wait time
> as 4 seconds and continue the same.
>    Thus the total time comes up to 1,2,4,4,4,4,4,4,4,4 = 35 seconds.
>    b. Versus collectively spending 100 seconds on waiting time in the
> static retry strategy.
>
> 2) Alternatively, the logic could be:
>    a. Have the 1st 3 attempts of retry. If further needed, fall back to
> the 1sec start of the same logic.
>       So, it looks like this.. (1,2,4)  (1,2,4)  (1,2,4) (1) for 10
> retries.
>    b. Thus we get the 10 retries done in collectively 22 seconds versus
> 100 seconds.
>
> Requesting feedback.
> Thank you.
>
> > Retry interval delay for NM client can be improved from the fixed static
> retry
> >
> -------------------------------------------------------------------------------
> >
> >                 Key: YARN-4185
> >                 URL: https://issues.apache.org/jira/browse/YARN-4185
> >             Project: Hadoop YARN
> >          Issue Type: Bug
> >            Reporter: Anubhav Dhoot
> >            Assignee: Neelesh Srinivas Salian
> >
> > Instead of having a fixed retry interval that starts off very high and
> stays there, we are better off using an exponential backoff that has the
> same fixed max limit. Today the retry interval is fixed at 10 sec that can
> be unnecessarily high especially when NMs could rolling restart within a
> sec.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Reply via email to