[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4180: - Fix Version/s: 2.8.0 > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Fix For: 2.8.0, 2.7.2, 2.6.4, 3.0.0-alpha1 > > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4180: --- Fix Version/s: 2.6.4 > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Fix For: 2.7.2, 2.6.4 > > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4180: - Target Version/s: 2.7.2, 2.6.4 (was: 2.7.2) > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Fix For: 2.7.2 > > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4180: Attachment: YARN-4180-branch-2.7.2.txt Minor conflicts in backporting changes to branch 2.7 > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180-branch-2.7.2.txt, YARN-4180.001.patch, > YARN-4180.002.patch, YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4180: --- Attachment: YARN-4180.002.patch Re-uploading the same patch to see if Jenkins kicks in. By the way, I ran the test locally and it passes. +1, even if Jenkins doesn't kick in. > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch, YARN-4180.002.patch, > YARN-4180.002.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4180: Attachment: YARN-4180.002.patch Try triggering jenkins again > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch, YARN-4180.002.patch, > YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4180: Attachment: YARN-4180.002.patch Addressed feedback > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch, YARN-4180.002.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4180: Attachment: YARN-4180.001.patch reuse the same retry proxy used by AM client for RM client. Also opened YARN-4185 to improve this retry mechanism > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > Attachments: YARN-4180.001.patch > > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4180) AMLauncher does not retry on failures when talking to NM
[ https://issues.apache.org/jira/browse/YARN-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4180: --- Affects Version/s: 2.7.1 Target Version/s: 2.7.2 Priority: Critical (was: Major) > AMLauncher does not retry on failures when talking to NM > - > > Key: YARN-4180 > URL: https://issues.apache.org/jira/browse/YARN-4180 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Critical > > We see issues with RM trying to launch a container while a NM is restarting > and we get exceptions like NMNotReadyException. While YARN-3842 added retry > for other clients of NM (AMs mainly) its not used by AMLauncher in RM causing > there intermittent errors to cause job failures. This can manifest during > rolling restart of NMs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)