[jira] [Created] (YARN-10825) Yarn Service containers not getting killed after NM shutdown
Sushanta Sen created YARN-10825: --- Summary: Yarn Service containers not getting killed after NM shutdown Key: YARN-10825 URL: https://issues.apache.org/jira/browse/YARN-10825 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.1.1 Reporter: Sushanta Sen When yarn.nodemanager.recovery.supervised is enabled and NM is shutdown, the new containers are getting launched after the RM sends the node lost event to AM, but the existing containers on the lost node are not getting killed. The issue has occurred only for yarn service. For Normal jobs the behavior is working fine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10684) YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried adding flag -promote_opportunistic_after_start
[ https://issues.apache.org/jira/browse/YARN-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10684: Description: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : Job Command :: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC -*promote_opportunistic_after_start* Actual Result: Distributed Shell Yarn Job Failed almost all times with below Diagnostics message *[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to make room for Guaranteed Container.]* Expected Result: DS job should be successful with argument "promote_opportunistic_after_start" ** ** was: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 10 -container_type OPPORTUNISTIC -*promote_opportunistic_after_start* Actual Result: Distributed Shell Yarn Job Failed almost all times with below Diagnostics message *[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to make room for Guaranteed Container.]* Expected Result: DS job should be successful with argument "promote_opportunistic_after_start" * ** * > YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried > adding flag -promote_opportunistic_after_start > --- > > Key: YARN-10684 > URL: https://issues.apache.org/jira/browse/YARN-10684 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-scheduling >Affects Versions: 3.1.1 >Reporter: Sushanta Sen >Priority: Major > > Preconditions: > # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed > # Set the below parameters in RM yarn-site.xml :: > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > # Set this in NM[s]yarn-site.xml ::: > yarn.nodemanager.opportunistic-containers-max-queue-length > 30 > > > Test Steps: > Job Command : : > Job Command :: Job Command : : yarn > org.apache.hadoop.yarn.applications.distributedshell.Client jar > HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar > -shell_command sleep -shell_args 20 -num_containers 20 -container_type > OPPORTUNISTIC -*promote_opportunistic_after_start* > Actual Result: Distributed Shell Yarn Job Failed almost all times with below > Diagnostics message > *[ Failed Reason : Application Failure: desired = 10, completed = 10, > allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container > Killed to make room for Guaranteed Container.]* > Expected Result: DS job should be successful with argument > "promote_opportunistic_after_start" ** ** -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for
[ https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10670: Description: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} Expected Result: Distributed Shell Yarn Job should not fail. was: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-*.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} Expected Result: Distributed Shell Yarn Job should not fail. > YARN: Opportunistic Container : : In distributed shell job if containers are > killed then application is failed. But in this case as containers are killed > to make room for guaranteed containers which is not correct to fail an > application > > > Key: YARN-10670 > URL: https://issues.apache.org/jira/browse/YARN-10670 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.1 >Reporter: Sushanta Sen >Assignee: Bilwa S T >Priority: Major > > Preconditions: > # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed > # Set the below parameters in RM yarn-site.xml :: > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > # Set this in NM[s]yarn-site.xml ::: > yarn.nodemanager.opportunistic-containers-max-queue-length > 30 > > > Test Steps: > Job Command : : yarn > org.apache.hadoop.yarn.applications.distributedshell.Client jar > HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1*.jar > -shell_command sleep -shell_args 20 -num_containers 20 -container_type > OPPORTUNISTIC > Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics > message > {noformat} > Attempt recovered after RM restartApplication Failure: desired = 20, > completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 > 22:11:48.440]Container De-queued to meet NM queuing limits. > [2021-02-09 22:11:48.441]Container terminated before launch. > {noformat} > Expected Result: Distributed Shell Yarn Job should not fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for
[ https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10670: Description: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-*.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} Expected Result: Distributed Shell Yarn Job should not fail. was: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} Expected Result: Distributed Shell Yarn Job should not fail. > YARN: Opportunistic Container : : In distributed shell job if containers are > killed then application is failed. But in this case as containers are killed > to make room for guaranteed containers which is not correct to fail an > application > > > Key: YARN-10670 > URL: https://issues.apache.org/jira/browse/YARN-10670 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.1 >Reporter: Sushanta Sen >Assignee: Bilwa S T >Priority: Major > > Preconditions: > # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed > # Set the below parameters in RM yarn-site.xml :: > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > # Set this in NM[s]yarn-site.xml ::: > yarn.nodemanager.opportunistic-containers-max-queue-length > 30 > > > Test Steps: > Job Command : : yarn > org.apache.hadoop.yarn.applications.distributedshell.Client -jar > HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-*.jar > -shell_command sleep -shell_args 20 -num_containers 20 -container_type > OPPORTUNISTIC > Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics > message > {noformat} > Attempt recovered after RM restartApplication Failure: desired = 20, > completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 > 22:11:48.440]Container De-queued to meet NM queuing limits. > [2021-02-09 22:11:48.441]Container terminated before launch. > {noformat} > Expected Result: Distributed Shell Yarn Job should not fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10684) YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried adding flag -promote_opportunistic_after_start
[ https://issues.apache.org/jira/browse/YARN-10684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10684: Description: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 10 -container_type OPPORTUNISTIC -*promote_opportunistic_after_start* Actual Result: Distributed Shell Yarn Job Failed almost all times with below Diagnostics message *[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to make room for Guaranteed Container.]* Expected Result: DS job should be successful with argument "promote_opportunistic_after_start" * ** * was: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 10 -container_type OPPORTUNISTIC -*promote_opportunistic_after_start* Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message *[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to make room for Guaranteed Container.]* Expected Result: DS job should be successful with argument "promote_opportunistic_after_start" ** ** > YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried > adding flag -promote_opportunistic_after_start > --- > > Key: YARN-10684 > URL: https://issues.apache.org/jira/browse/YARN-10684 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-scheduling >Affects Versions: 3.1.1 >Reporter: Sushanta Sen >Priority: Major > > Preconditions: > # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed > # Set the below parameters in RM yarn-site.xml :: > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > # Set this in NM[s]yarn-site.xml ::: > yarn.nodemanager.opportunistic-containers-max-queue-length > 30 > > > Test Steps: > Job Command : : > Job Command :: yarn > org.apache.hadoop.yarn.applications.distributedshell.Client -jar > HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar > -shell_command sleep -shell_args 20 -num_containers 10 -container_type > OPPORTUNISTIC -*promote_opportunistic_after_start* > Actual Result: Distributed Shell Yarn Job Failed almost all times with below > Diagnostics message > *[ Failed Reason : Application Failure: desired = 10, completed = 10, > allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container > Killed to make room for Guaranteed Container.]* > Expected Result: DS job should be successful with argument > "promote_opportunistic_after_start" * ** * -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10684) YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried adding flag -promote_opportunistic_after_start
Sushanta Sen created YARN-10684: --- Summary: YARN: Opportunistic Container :: Distributed YARN Job has Failed when tried adding flag -promote_opportunistic_after_start Key: YARN-10684 URL: https://issues.apache.org/jira/browse/YARN-10684 Project: Hadoop YARN Issue Type: Bug Components: distributed-scheduling Affects Versions: 3.1.1 Reporter: Sushanta Sen Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : Job Command :: yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 10 -container_type OPPORTUNISTIC -*promote_opportunistic_after_start* Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message *[ Failed Reason : Application Failure: desired = 10, completed = 10, allocated = 10, failed = 2, diagnostics = [2021-02-10 00:00:27.640]Container Killed to make room for Guaranteed Container.]* Expected Result: DS job should be successful with argument "promote_opportunistic_after_start" ** ** -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for
[ https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10670: Description: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM yarn-site.xml :: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]yarn-site.xml ::: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} Expected Result: Distributed Shell Yarn Job should not fail. was: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM:: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} Expected Result: Distributed Shell Yarn Job should not fail. > YARN: Opportunistic Container : : In distributed shell job if containers are > killed then application is failed. But in this case as containers are killed > to make room for guaranteed containers which is not correct to fail an > application > > > Key: YARN-10670 > URL: https://issues.apache.org/jira/browse/YARN-10670 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.1 >Reporter: Sushanta Sen >Assignee: Bilwa S T >Priority: Major > > Preconditions: > # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed > # Set the below parameters in RM yarn-site.xml :: > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > # Set this in NM[s]yarn-site.xml ::: > yarn.nodemanager.opportunistic-containers-max-queue-length > 30 > > > Test Steps: > Job Command : : yarn > org.apache.hadoop.yarn.applications.distributedshell.Client -jar > HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar > -shell_command sleep -shell_args 20 -num_containers 20 -container_type > OPPORTUNISTIC > Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics > message > {noformat} > Attempt recovered after RM restartApplication Failure: desired = 20, > completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 > 22:11:48.440]Container De-queued to meet NM queuing limits. > [2021-02-09 22:11:48.441]Container terminated before launch. > {noformat} > Expected Result: Distributed Shell Yarn Job should not fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for
[ https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10670: Description: Preconditions: # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed # Set the below parameters in RM:: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} Expected Result: Distributed Shell Yarn Job should not fail. was: Preconditions: # Secure Hadoop 3.1.1 c3 Nodes cluster is installed # Set the below parameters in RM:: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} Expected Result: Distributed Shell Yarn Job should not fail. > YARN: Opportunistic Container : : In distributed shell job if containers are > killed then application is failed. But in this case as containers are killed > to make room for guaranteed containers which is not correct to fail an > application > > > Key: YARN-10670 > URL: https://issues.apache.org/jira/browse/YARN-10670 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.1 >Reporter: Sushanta Sen >Priority: Major > > Preconditions: > # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed > # Set the below parameters in RM:: > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > # Set this in NM[s]: > yarn.nodemanager.opportunistic-containers-max-queue-length > 30 > > > Test Steps: > Job Command : : yarn > org.apache.hadoop.yarn.applications.distributedshell.Client -jar > HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar > -shell_command sleep -shell_args 20 -num_containers 20 -container_type > OPPORTUNISTIC > Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics > message > {noformat} > Attempt recovered after RM restartApplication Failure: desired = 20, > completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 > 22:11:48.440]Container De-queued to meet NM queuing limits. > [2021-02-09 22:11:48.441]Container terminated before launch. > {noformat} > Expected Result: Distributed Shell Yarn Job should not fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for
[ https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10670: Description: Preconditions: # Secure Hadoop 3.1.1 c3 Nodes cluster is installed # Set the below parameters in RM:: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} Expected Result: Distributed Shell Yarn Job should not fail. was: Preconditions: # Secure Hadoop 3.1.1 c3 Nodes cluster is installed # Set the below parameters in RM:: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} > YARN: Opportunistic Container : : In distributed shell job if containers are > killed then application is failed. But in this case as containers are killed > to make room for guaranteed containers which is not correct to fail an > application > > > Key: YARN-10670 > URL: https://issues.apache.org/jira/browse/YARN-10670 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.1 >Reporter: Sushanta Sen >Priority: Major > > Preconditions: > # Secure Hadoop 3.1.1 c3 Nodes cluster is installed > # Set the below parameters in RM:: > yarn.resourcemanager.opportunistic-container-allocation.enabled > true > > # Set this in NM[s]: > yarn.nodemanager.opportunistic-containers-max-queue-length > 30 > > > Test Steps: > Job Command : : yarn > org.apache.hadoop.yarn.applications.distributedshell.Client -jar > HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar > -shell_command sleep -shell_args 20 -num_containers 20 -container_type > OPPORTUNISTIC > Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics > message > {noformat} > Attempt recovered after RM restartApplication Failure: desired = 20, > completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 > 22:11:48.440]Container De-queued to meet NM queuing limits. > [2021-02-09 22:11:48.441]Container terminated before launch. > {noformat} > Expected Result: Distributed Shell Yarn Job should not fail. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for
Sushanta Sen created YARN-10670: --- Summary: YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for guaranteed containers which is not correct to fail an application Key: YARN-10670 URL: https://issues.apache.org/jira/browse/YARN-10670 Project: Hadoop YARN Issue Type: Bug Components: distributed-shell Affects Versions: 3.1.1 Reporter: Sushanta Sen Preconditions: # Secure Hadoop 3.1.1 c3 Nodes cluster is installed # Set the below parameters in RM:: yarn.resourcemanager.opportunistic-container-allocation.enabled true # Set this in NM[s]: yarn.nodemanager.opportunistic-containers-max-queue-length 30 Test Steps: Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message {noformat} Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits. [2021-02-09 22:11:48.441]Container terminated before launch. {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10669) Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN on RM switch and TS restart
[ https://issues.apache.org/jira/browse/YARN-10669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10669: Affects Version/s: 3.1.1 > Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN on RM switch and TS > restart > -- > > Key: YARN-10669 > URL: https://issues.apache.org/jira/browse/YARN-10669 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineservice >Affects Versions: 3.1.1 > Environment: 3 Nodes Hadoop Secure cluster with 3.1.1 version >Reporter: Sushanta Sen >Priority: Major > > Using delegation token rather than the keytab of the user when submitting job > to yarn. > And this config yarn.timeline-service.enabled = true. > So addTimelineDelegationToken will be executed. My Job has submitted > successfully, but the question is my job failed when I Switched RM and TS > restart because TIMELINE_DELEGATION_TOKEN renew failed. > Only RM switch and TS restart will reproduce the issue. > RM log snippet below: > {noformat} > 2020-12-02 17:37:21,268 | WARN | DelegationTokenRenewer #3402 | Unable to > add the application to the delegation token renewer. | > DelegationTokenRenewer.java:949 > java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, > Service: 192.168.0.2:8190, Ident: (TIMELINE_DELEGATION_TOKEN owner=bnn, > renewer=mapred, realUser=executor, issueDate=1606880472758, > maxDate=1607485272758, sequenceNumber=11581, masterKeyId=13) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:508) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$1100(DelegationTokenRenewer.java:80) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:945) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:922) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: HTTP status [403], message > [org.apache.hadoop.security.token.SecretManager$InvalidToken: Unable to find > master key for keyId=13 from cache. Failed to renew an unexpired token > (TIMELINE_DELEGATION_TOKEN owner=bnn, renewer=mapred, realUser=executor, > issueDate=1606880472758, maxDate=1607485272758, sequenceNumber=11581, > masterKeyId=13) with sequenceNumber=11581] > at > org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:174) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:323) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:239) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:426) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:247) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:227) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientRetryOpForOperateDelegationToken.run(TimelineConnector.java:431) > at > org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientConnectionRetry.retryOn(TimelineConnector.java:334) > at > org.apache.hadoop.yarn.client.api.impl.TimelineConnector.operateDelegationToken(TimelineConnector.java:218) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:250) > at > org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81) > at org.apache.hadoop.security.token.Token.renew(Token.java:490) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:634) > at >
[jira] [Created] (YARN-10669) Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN on RM switch and TS restart
Sushanta Sen created YARN-10669: --- Summary: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN on RM switch and TS restart Key: YARN-10669 URL: https://issues.apache.org/jira/browse/YARN-10669 Project: Hadoop YARN Issue Type: Bug Components: timelineservice Environment: 3 Nodes Hadoop Secure cluster with 3.1.1 version Reporter: Sushanta Sen Using delegation token rather than the keytab of the user when submitting job to yarn. And this config yarn.timeline-service.enabled = true. So addTimelineDelegationToken will be executed. My Job has submitted successfully, but the question is my job failed when I Switched RM and TS restart because TIMELINE_DELEGATION_TOKEN renew failed. Only RM switch and TS restart will reproduce the issue. RM log snippet below: {noformat} 2020-12-02 17:37:21,268 | WARN | DelegationTokenRenewer #3402 | Unable to add the application to the delegation token renewer. | DelegationTokenRenewer.java:949 java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 192.168.0.2:8190, Ident: (TIMELINE_DELEGATION_TOKEN owner=bnn, renewer=mapred, realUser=executor, issueDate=1606880472758, maxDate=1607485272758, sequenceNumber=11581, masterKeyId=13) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:508) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$1100(DelegationTokenRenewer.java:80) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:945) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:922) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: HTTP status [403], message [org.apache.hadoop.security.token.SecretManager$InvalidToken: Unable to find master key for keyId=13 from cache. Failed to renew an unexpired token (TIMELINE_DELEGATION_TOKEN owner=bnn, renewer=mapred, realUser=executor, issueDate=1606880472758, maxDate=1607485272758, sequenceNumber=11581, masterKeyId=13) with sequenceNumber=11581] at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:174) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:323) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:239) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:426) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:247) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:227) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientRetryOpForOperateDelegationToken.run(TimelineConnector.java:431) at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientConnectionRetry.retryOn(TimelineConnector.java:334) at org.apache.hadoop.yarn.client.api.impl.TimelineConnector.operateDelegationToken(TimelineConnector.java:218) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:250) at org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81) at org.apache.hadoop.security.token.Token.renew(Token.java:490) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:634) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:631) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:630) at
[jira] [Updated] (YARN-10666) In ProcfsBasedProcessTree reading smaps file show Permission denied
[ https://issues.apache.org/jira/browse/YARN-10666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10666: Description: When job submitter user is other than NM's user. Then NM failed to read /proc//smaps file. Because smaps file is owned by job submitter user, which is not able to read by NM's user. was: When job submitter user is other than NM's user. Then NM failed to read /proc//smaps file. Because smaps file is owned by job submitter user, which is not able to read by NM's user. {noformat} *no* further _formatting_ is done here{noformat} {noformat} *no* further _formatting_ is done here{noformat} > In ProcfsBasedProcessTree reading smaps file show Permission denied > --- > > Key: YARN-10666 > URL: https://issues.apache.org/jira/browse/YARN-10666 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sushanta Sen >Priority: Major > > When job submitter user is other than NM's user. > Then NM failed to read /proc//smaps file. > Because smaps file is owned by job submitter user, which is not able to read > by NM's user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10666) In ProcfsBasedProcessTree reading smaps file show Permission denied
[ https://issues.apache.org/jira/browse/YARN-10666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10666: Description: When job submitter user is other than NM's user. Then NM failed to read /proc//smaps file. Because smaps file is owned by job submitter user, which is not able to read by NM's user. {noformat} *no* further _formatting_ is done here{noformat} {noformat} *no* further _formatting_ is done here{noformat} was: When job submitter user is other than NM's user. Then NM failed to read /proc//smaps file. Because smaps file is owned by job submitter user, which is not able to read by NM's user. {noformat} *no* further _formatting_ is done here{noformat} !image-2021-03-03-12-33-51-034.png! > In ProcfsBasedProcessTree reading smaps file show Permission denied > --- > > Key: YARN-10666 > URL: https://issues.apache.org/jira/browse/YARN-10666 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sushanta Sen >Priority: Major > > When job submitter user is other than NM's user. > Then NM failed to read /proc//smaps file. > Because smaps file is owned by job submitter user, which is not able to read > by NM's user. > {noformat} > *no* further _formatting_ is done here{noformat} > {noformat} > *no* further _formatting_ is done here{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10666) In ProcfsBasedProcessTree reading smaps file show Permission denied
[ https://issues.apache.org/jira/browse/YARN-10666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10666: Description: When job submitter user is other than NM's user. Then NM failed to read /proc//smaps file. Because smaps file is owned by job submitter user, which is not able to read by NM's user. {noformat} *no* further _formatting_ is done here{noformat} !image-2021-03-03-12-33-51-034.png! was: When job submitter user is other than NM's user. Then NM failed to read /proc//smaps file. Because smaps file is owned by job submitter user, which is not able to read by NM's user. > In ProcfsBasedProcessTree reading smaps file show Permission denied > --- > > Key: YARN-10666 > URL: https://issues.apache.org/jira/browse/YARN-10666 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sushanta Sen >Priority: Major > > When job submitter user is other than NM's user. > Then NM failed to read /proc//smaps file. > Because smaps file is owned by job submitter user, which is not able to read > by NM's user. > {noformat} > *no* further _formatting_ is done here{noformat} > !image-2021-03-03-12-33-51-034.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10666) In ProcfsBasedProcessTree reading smaps file show Permission denied
Sushanta Sen created YARN-10666: --- Summary: In ProcfsBasedProcessTree reading smaps file show Permission denied Key: YARN-10666 URL: https://issues.apache.org/jira/browse/YARN-10666 Project: Hadoop YARN Issue Type: Bug Reporter: Sushanta Sen When job submitter user is other than NM's user. Then NM failed to read /proc//smaps file. Because smaps file is owned by job submitter user, which is not able to read by NM's user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10634) The config parameter "mapreduce.job.num-opportunistic-maps-percent" is confusing when requesting Opportunistic containers in YARN job
[ https://issues.apache.org/jira/browse/YARN-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10634: Description: Execute the below job by Passing this config -Dmapreduce.job.num-opportunistic-maps-percent ,which actually represents the number of containers to be launched as Opportunistic, not in % of the total mappers requested , i think this configuration name should be modified accordingly and also {color:#de350b}the same gets printed in AM logs which also needs to be corrected accordingly.{color} Job Command: hadoop jar HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1-hw-ei-310001-SNAPSHOT.jar pi -{color:#de350b}Dmapreduce.job.num-opportunistic-maps-percent{color}="20" 20 99 In AM logs this message is displayed. it should be {color:#de350b}20 , not 20% {color}? “2021-02-10 20:23:23,023 | INFO | main | {color:#de350b}20% of the mappers{color} will be scheduled using OPPORTUNISTIC containers | RMContainerAllocator.java:257” Job Command: hadoop jar HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1-hw-ei-310001-SNAPSHOT.jar pi {color:#de350b}-Dmapreduce.job.num-opportunistic-maps-percent{color}="100" 20 99 In AM logs this message is displayed. It should be {color:#de350b}100, not 100%{color} ? 2021-02-10 20:28:16,016 | INFO | main | {color:#de350b}100% of the mapper{color}s will be scheduled using OPPORTUNISTIC containers | RMContainerAllocator.java:257 was: Execute the below job by Passing this config -Dmapreduce.job.num-opportunistic-maps-percent ,which actually represents the number of containers to be launched as Opportunistic, not in % of the total mappers requested , i think this configuration name should be modified accordingly and also {color:#de350b}the same gets printed in AM logs{color} Job Command: hadoop jar HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1-hw-ei-310001-SNAPSHOT.jar pi -{color:#de350b}Dmapreduce.job.num-opportunistic-maps-percent{color}="20" 20 99 In AM logs this message is displayed. it should be {color:#de350b}20 , not 20% {color}? “2021-02-10 20:23:23,023 | INFO | main | {color:#de350b}20% of the mappers{color} will be scheduled using OPPORTUNISTIC containers | RMContainerAllocator.java:257” Job Command: hadoop jar HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1-hw-ei-310001-SNAPSHOT.jar pi {color:#de350b}-Dmapreduce.job.num-opportunistic-maps-percent{color}="100" 20 99 In AM logs this message is displayed. It should be {color:#de350b}100, not 100%{color} ? 2021-02-10 20:28:16,016 | INFO | main | {color:#de350b}100% of the mapper{color}s will be scheduled using OPPORTUNISTIC containers | RMContainerAllocator.java:257 > The config parameter "mapreduce.job.num-opportunistic-maps-percent" is > confusing when requesting Opportunistic containers in YARN job > - > > Key: YARN-10634 > URL: https://issues.apache.org/jira/browse/YARN-10634 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Reporter: Sushanta Sen >Assignee: Bilwa S T >Priority: Minor > > Execute the below job by Passing this config > -Dmapreduce.job.num-opportunistic-maps-percent ,which actually represents the > number of containers to be launched as Opportunistic, not in % of the total > mappers requested , i think this configuration name should be modified > accordingly and also {color:#de350b}the same gets printed in AM logs which > also needs to be corrected accordingly.{color} > Job Command: hadoop jar > HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1-hw-ei-310001-SNAPSHOT.jar > pi -{color:#de350b}Dmapreduce.job.num-opportunistic-maps-percent{color}="20" > 20 99 > In AM logs this message is displayed. it should be {color:#de350b}20 , not > 20% {color}? > “2021-02-10 20:23:23,023 | INFO | main | {color:#de350b}20% of the > mappers{color} will be scheduled using OPPORTUNISTIC containers | > RMContainerAllocator.java:257” > Job Command: hadoop jar > HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1-hw-ei-310001-SNAPSHOT.jar > pi > {color:#de350b}-Dmapreduce.job.num-opportunistic-maps-percent{color}="100" 20 > 99 > In AM logs this message is displayed. It should be {color:#de350b}100, not > 100%{color} ? > 2021-02-10 20:28:16,016 | INFO | main | {color:#de350b}100% of the > mapper{color}s will be scheduled using OPPORTUNISTIC containers | > RMContainerAllocator.java:257 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands,
[jira] [Updated] (YARN-10634) The config parameter "mapreduce.job.num-opportunistic-maps-percent" is confusing when requesting Opportunistic containers in YARN job
[ https://issues.apache.org/jira/browse/YARN-10634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10634: Summary: The config parameter "mapreduce.job.num-opportunistic-maps-percent" is confusing when requesting Opportunistic containers in YARN job (was: The config parameter name is confusing when requesting Opportunistic containers in YARN job) > The config parameter "mapreduce.job.num-opportunistic-maps-percent" is > confusing when requesting Opportunistic containers in YARN job > - > > Key: YARN-10634 > URL: https://issues.apache.org/jira/browse/YARN-10634 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Reporter: Sushanta Sen >Priority: Minor > > Execute the below job by Passing this config > -Dmapreduce.job.num-opportunistic-maps-percent ,which actually represents the > number of containers to be launched as Opportunistic, not in % of the total > mappers requested , i think this configuration name should be modified > accordingly and also {color:#de350b}the same gets printed in AM logs{color} > Job Command: hadoop jar > HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1-hw-ei-310001-SNAPSHOT.jar > pi -{color:#de350b}Dmapreduce.job.num-opportunistic-maps-percent{color}="20" > 20 99 > In AM logs this message is displayed. it should be {color:#de350b}20 , not > 20% {color}? > “2021-02-10 20:23:23,023 | INFO | main | {color:#de350b}20% of the > mappers{color} will be scheduled using OPPORTUNISTIC containers | > RMContainerAllocator.java:257” > Job Command: hadoop jar > HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1-hw-ei-310001-SNAPSHOT.jar > pi > {color:#de350b}-Dmapreduce.job.num-opportunistic-maps-percent{color}="100" 20 > 99 > In AM logs this message is displayed. It should be {color:#de350b}100, not > 100%{color} ? > 2021-02-10 20:28:16,016 | INFO | main | {color:#de350b}100% of the > mapper{color}s will be scheduled using OPPORTUNISTIC containers | > RMContainerAllocator.java:257 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10634) The config parameter name is confusing when requesting Opportunistic containers in YARN job
Sushanta Sen created YARN-10634: --- Summary: The config parameter name is confusing when requesting Opportunistic containers in YARN job Key: YARN-10634 URL: https://issues.apache.org/jira/browse/YARN-10634 Project: Hadoop YARN Issue Type: Bug Components: applications Reporter: Sushanta Sen Execute the below job by Passing this config -Dmapreduce.job.num-opportunistic-maps-percent ,which actually represents the number of containers to be launched as Opportunistic, not in % of the total mappers requested , i think this configuration name should be modified accordingly and also {color:#de350b}the same gets printed in AM logs{color} Job Command: hadoop jar HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1-hw-ei-310001-SNAPSHOT.jar pi -{color:#de350b}Dmapreduce.job.num-opportunistic-maps-percent{color}="20" 20 99 In AM logs this message is displayed. it should be {color:#de350b}20 , not 20% {color}? “2021-02-10 20:23:23,023 | INFO | main | {color:#de350b}20% of the mappers{color} will be scheduled using OPPORTUNISTIC containers | RMContainerAllocator.java:257” Job Command: hadoop jar HDFS/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1-hw-ei-310001-SNAPSHOT.jar pi {color:#de350b}-Dmapreduce.job.num-opportunistic-maps-percent{color}="100" 20 99 In AM logs this message is displayed. It should be {color:#de350b}100, not 100%{color} ? 2021-02-10 20:28:16,016 | INFO | main | {color:#de350b}100% of the mapper{color}s will be scheduled using OPPORTUNISTIC containers | RMContainerAllocator.java:257 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10136) In Secure Federation, Router About UI page failed to display the subclusters information appropriately.
Sushanta Sen created YARN-10136: --- Summary: In Secure Federation, Router About UI page failed to display the subclusters information appropriately. Key: YARN-10136 URL: https://issues.apache.org/jira/browse/YARN-10136 Project: Hadoop YARN Issue Type: Bug Components: federation, yarn Reporter: Sushanta Sen In Secure Federation, Router About UI page failed to display the subclusters information appropriately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10133) For yarn Federation, there is no routeradmin command to manage
Sushanta Sen created YARN-10133: --- Summary: For yarn Federation, there is no routeradmin command to manage Key: YARN-10133 URL: https://issues.apache.org/jira/browse/YARN-10133 Project: Hadoop YARN Issue Type: Bug Components: federation, yarn Reporter: Sushanta Sen In an yarn Federated cluster, there does not exist any routeradmin command to manage, like HDFS has 'dfsrouteradmin' command. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10132) For Federation,yarn applicationattempt fail command throws an exception
Sushanta Sen created YARN-10132: --- Summary: For Federation,yarn applicationattempt fail command throws an exception Key: YARN-10132 URL: https://issues.apache.org/jira/browse/YARN-10132 Project: Hadoop YARN Issue Type: Bug Reporter: Sushanta Sen yarn applicationattempt fail command is failing with exception “org.apache.commons.lang.NotImplementedException: Code is not implemented”. {noformat} ./yarn applicationattempt -fail appattempt_1581497870689_0001_01 Failing attempt appattempt_1581497870689_0001_01 of application application_1581497870689_0001 2020-02-12 20:48:48,530 INFO impl.YarnClientImpl: Failing application attempt appattempt_1581497870689_0001_01 Exception in thread "main" org.apache.commons.lang.NotImplementedException: Code is not implemented at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.failApplicationAttempt(FederationClientInterceptor.java:980) at org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.failApplicationAttempt(RouterClientRMService.java:388) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.failApplicationAttempt(ApplicationClientProtocolPBServiceImpl.java:210) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:581) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2793) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.failApplicationAttempt(ApplicationClientProtocolPBClientImpl.java:223) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy8.failApplicationAttempt(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.failApplicationAttempt(YarnClientImpl.java:447) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.failApplicationAttempt(ApplicationCLI.java:985) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:455) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:119) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10131) In Federation,few yarn application commands do not work
Sushanta Sen created YARN-10131: --- Summary: In Federation,few yarn application commands do not work Key: YARN-10131 URL: https://issues.apache.org/jira/browse/YARN-10131 Project: Hadoop YARN Issue Type: Bug Components: federation, yarn Reporter: Sushanta Sen In Federation,the below mentioned yarn application commands do not work. ./yarn app -updatePriority 3 -appId ./yarn app -changeQueue q1 -appId ./yarn application -updateLifetime 40 -appId All the above prompt the same exception"Code not implemented". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10122) In Federation,executing yarn container signal command throws an exception
Sushanta Sen created YARN-10122: --- Summary: In Federation,executing yarn container signal command throws an exception Key: YARN-10122 URL: https://issues.apache.org/jira/browse/YARN-10122 Project: Hadoop YARN Issue Type: Bug Components: federation, yarn Reporter: Sushanta Sen Executing yarn container signal command failed, prompting an error “org.apache.commons.lang.NotImplementedException: Code is not implemented”. {noformat} ./yarn container -signal container_e79_1581316978887_0001_01_10 Signalling container container_e79_1581316978887_0001_01_10 2020-02-10 14:51:18,045 INFO impl.YarnClientImpl: Signalling container container_e79_1581316978887_0001_01_10 with command OUTPUT_THREAD_DUMP Exception in thread "main" org.apache.commons.lang.NotImplementedException: Code is not implemented at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.signalToContainer(FederationClientInterceptor.java:993) at org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.signalToContainer(RouterClientRMService.java:403) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.signalToContainer(ApplicationClientProtocolPBServiceImpl.java:629) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:629) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2793) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.signalToContainer(ApplicationClientProtocolPBClientImpl.java:620) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy8.signalToContainer(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.signalToContainer(YarnClientImpl.java:949) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.signalToContainer(ApplicationCLI.java:717) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:478) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:119) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10121) In Federation executing yarn queue status command throws an exception
Sushanta Sen created YARN-10121: --- Summary: In Federation executing yarn queue status command throws an exception Key: YARN-10121 URL: https://issues.apache.org/jira/browse/YARN-10121 Project: Hadoop YARN Issue Type: Bug Components: federation, yarn Reporter: Sushanta Sen yarn queue status is failing, prompting an error “org.apache.commons.lang.NotImplementedException: Code is not implemented”. {noformat} ./yarn queue -status default Exception in thread "main" org.apache.commons.lang.NotImplementedException: Code is not implemented at org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getQueueInfo(FederationClientInterceptor.java:715) at org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getQueueInfo(RouterClientRMService.java:246) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:328) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:591) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2793) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateRuntimeException(RPCUtil.java:85) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:122) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getQueueInfo(ApplicationClientProtocolPBClientImpl.java:341) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy8.getQueueInfo(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getQueueInfo(YarnClientImpl.java:650) at org.apache.hadoop.yarn.client.cli.QueueCLI.listQueue(QueueCLI.java:111) at org.apache.hadoop.yarn.client.cli.QueueCLI.run(QueueCLI.java:78) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.yarn.client.cli.QueueCLI.main(QueueCLI.java:50) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10120) In Federation Router Nodes/Applications/About pages throws 500 exception when https is enabled
[ https://issues.apache.org/jira/browse/YARN-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10120: Description: In Federation Router Nodes/Applications/About pages throws 500 exception when https is enabled. yarn.router.webapp.https.address =router ip:8091 {noformat} 2020-02-07 16:38:49,990 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/apps java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:166) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1622) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:539) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:259) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at
[jira] [Updated] (YARN-10120) In Federation Router Nodes/Applications/About pages throws 500 exception when https is enabled
[ https://issues.apache.org/jira/browse/YARN-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10120: Description: In Federation Router Nodes/Applications/About pages throws 500 exception when https is enabled. yarn.router.webapp.https.address =0.0.0.0:8091 {noformat} 2020-02-07 16:38:49,990 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/apps java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:166) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1622) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:539) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:259) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at
[jira] [Updated] (YARN-10120) In Federation Router Nodes/Applications/About pages throws 500 exception when https is enabled
[ https://issues.apache.org/jira/browse/YARN-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-10120: Description: In Federation Router Nodes/Applications/About pages throws 500 exception when https is enabled. yarn.router.webapp.https.address =0.0.0.0:8091 {noformat} 2020-02-07 16:38:49,990 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/apps java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:166) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1622) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1767) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:583) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:539) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:259) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at
[jira] [Created] (YARN-10120) In Federation Router UI does not launch when https is enabled
Sushanta Sen created YARN-10120: --- Summary: In Federation Router UI does not launch when https is enabled Key: YARN-10120 URL: https://issues.apache.org/jira/browse/YARN-10120 Project: Hadoop YARN Issue Type: Bug Components: yarn Environment: In Federation Router UI does not launch with https secure port when the below parameter is set in router yarn-site.xml yarn.router.webapp.https.address =0.0.0.0:8091 Reporter: Sushanta Sen -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10111) In Federation cluster Distributed Shell Application submission fails as YarnClient#getQueueInfo is not implemented
Sushanta Sen created YARN-10111: --- Summary: In Federation cluster Distributed Shell Application submission fails as YarnClient#getQueueInfo is not implemented Key: YARN-10111 URL: https://issues.apache.org/jira/browse/YARN-10111 Project: Hadoop YARN Issue Type: Bug Reporter: Sushanta Sen In Federation cluster Distributed Shell Application submission fails as YarnClient#getQueueInfo is not implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10110) In Yarn Secure Federated cluster ,if hadoop.security.authorization= true in Router & client core-site.xml, on executing a job it throws the below error.
Sushanta Sen created YARN-10110: --- Summary: In Yarn Secure Federated cluster ,if hadoop.security.authorization= true in Router & client core-site.xml, on executing a job it throws the below error. Key: YARN-10110 URL: https://issues.apache.org/jira/browse/YARN-10110 Project: Hadoop YARN Issue Type: Bug Reporter: Sushanta Sen 【Precondition】: 1. Secure Federated cluster is available 2. Add the below configuration in Router and client core-site.xml hadoop.security.authorization= true 3. Restart the router service 【Test step】: 1. Go to router client bin path and submit a MR PI job 2. Observe the client console screen 【Expect Output】: No error should be thrown and Job should be successful 【Actual Output】: Job failed prompting "Protocol interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB is not known.," 【Additional Note】: But on setting the parameter as false, job is submitted and success. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9935) SSLHandshakeException thrown when HTTPS is enabled in AM web server in one certain condition
[ https://issues.apache.org/jira/browse/YARN-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-9935: --- Description: 【Precondition】: 1. Install the cluster 2. *{color:#4C9AFF}WebAppProxyServer service installed in 1 VM and RMs installed in 2 VMs{color}* 3. Enables all the HTTPS configuration required yarn.resourcemanager.application-https.policy STRICT yarn.app.mapreduce.am.webapp.https.enabled true yarn.app.mapreduce.am.webapp.https.client.auth true 4. RM HA enabled 5. *{color:#4C9AFF}Active RM is running in VM2, standby in VM1{color}* 6. Cluster should be up and running 【Test step】: 1.Submit an application 2. Open Application Master link from the applicationID from RM UI 【Expect Output】: No error should be thrown and JOb should be successful 【Actual Output】: SSLHandshakeException thrown , although Job is successful. "javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target" was: 【Precondition】: 1. Install the cluster 2. WebAppProxyServer service installed in 1 VM and RMs installed in 2 VMs 3. Enables all the HTTPS configuration required yarn.resourcemanager.application-https.policy STRICT yarn.app.mapreduce.am.webapp.https.enabled true yarn.app.mapreduce.am.webapp.https.client.auth true 4. RM HA enabled 5. *{color:#4C9AFF}Active RM is running in VM2, standby in VM1{color}* 6. Cluster should be up and running 【Test step】: 1.Submit an application 2. Open Application Master link from the applicationID from RM UI 【Expect Output】: No error should be thrown and JOb should be successful 【Actual Output】: SSLHandshakeException thrown , although Job is successful. "javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target" > SSLHandshakeException thrown when HTTPS is enabled in AM web server in one > certain condition > > > Key: YARN-9935 > URL: https://issues.apache.org/jira/browse/YARN-9935 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy >Reporter: Sushanta Sen >Priority: Major > > 【Precondition】: > 1. Install the cluster > 2. *{color:#4C9AFF}WebAppProxyServer service installed in 1 VM and RMs > installed in 2 VMs{color}* > 3. Enables all the HTTPS configuration required > yarn.resourcemanager.application-https.policy > STRICT > yarn.app.mapreduce.am.webapp.https.enabled > true > yarn.app.mapreduce.am.webapp.https.client.auth > true > 4. RM HA enabled > 5. *{color:#4C9AFF}Active RM is running in VM2, standby in VM1{color}* > 6. Cluster should be up and running > 【Test step】: > 1.Submit an application > 2. Open Application Master link from the applicationID from RM UI > 【Expect Output】: > No error should be thrown and JOb should be successful > 【Actual Output】: > SSLHandshakeException thrown , although Job is successful. > "javax.net.ssl.SSLHandshakeException: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to find > valid certification path to requested target" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9935) SSLHandshakeException thrown when HTTPS is enabled in AM web server in one certain condition
[ https://issues.apache.org/jira/browse/YARN-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanta Sen updated YARN-9935: --- Description: 【Precondition】: 1. Install the cluster 2. WebAppProxyServer service installed in 1 VM and RMs installed in 2 VMs 3. Enables all the HTTPS configuration required yarn.resourcemanager.application-https.policy STRICT yarn.app.mapreduce.am.webapp.https.enabled true yarn.app.mapreduce.am.webapp.https.client.auth true 4. RM HA enabled 5. *{color:#4C9AFF}Active RM is running in VM2, standby in VM1{color}* 6. Cluster should be up and running 【Test step】: 1.Submit an application 2. Open Application Master link from the applicationID from RM UI 【Expect Output】: No error should be thrown and JOb should be successful 【Actual Output】: SSLHandshakeException thrown , although Job is successful. "javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target" was: 【Precondition】: 1. Install the cluster 2. WebAppProxyServer service installed in 1 VM and RMs installed in 2 VMs 3. Enables all the HTTPS configuration required yarn.resourcemanager.application-https.policy STRICT yarn.app.mapreduce.am.webapp.https.enabled true yarn.app.mapreduce.am.webapp.https.client.auth true 4. RM HA enabled 5. Active RM is running in VM2, standby in VM1 6. Cluster should be up and running 【Test step】: 1.Submit an application 2. Open Application Master link from the applicationID from RM UI 【Expect Output】: No error should be thrown and JOb should be successful 【Actual Output】: SSLHandshakeException thrown , although Job is successful. "javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target" > SSLHandshakeException thrown when HTTPS is enabled in AM web server in one > certain condition > > > Key: YARN-9935 > URL: https://issues.apache.org/jira/browse/YARN-9935 > Project: Hadoop YARN > Issue Type: Bug > Components: amrmproxy >Reporter: Sushanta Sen >Priority: Major > > 【Precondition】: > 1. Install the cluster > 2. WebAppProxyServer service installed in 1 VM and RMs installed in 2 VMs > 3. Enables all the HTTPS configuration required > yarn.resourcemanager.application-https.policy > STRICT > yarn.app.mapreduce.am.webapp.https.enabled > true > yarn.app.mapreduce.am.webapp.https.client.auth > true > 4. RM HA enabled > 5. *{color:#4C9AFF}Active RM is running in VM2, standby in VM1{color}* > 6. Cluster should be up and running > 【Test step】: > 1.Submit an application > 2. Open Application Master link from the applicationID from RM UI > 【Expect Output】: > No error should be thrown and JOb should be successful > 【Actual Output】: > SSLHandshakeException thrown , although Job is successful. > "javax.net.ssl.SSLHandshakeException: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to find > valid certification path to requested target" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9935) SSLHandshakeException thrown when HTTPS is enabled in AM web server in one certain condition
Sushanta Sen created YARN-9935: -- Summary: SSLHandshakeException thrown when HTTPS is enabled in AM web server in one certain condition Key: YARN-9935 URL: https://issues.apache.org/jira/browse/YARN-9935 Project: Hadoop YARN Issue Type: Bug Components: amrmproxy Reporter: Sushanta Sen 【Precondition】: 1. Install the cluster 2. WebAppProxyServer service installed in 1 VM and RMs installed in 2 VMs 3. Enables all the HTTPS configuration required yarn.resourcemanager.application-https.policy STRICT yarn.app.mapreduce.am.webapp.https.enabled true yarn.app.mapreduce.am.webapp.https.client.auth true 4. RM HA enabled 5. Active RM is running in VM2, standby in VM1 6. Cluster should be up and running 【Test step】: 1.Submit an application 2. Open Application Master link from the applicationID from RM UI 【Expect Output】: No error should be thrown and JOb should be successful 【Actual Output】: SSLHandshakeException thrown , although Job is successful. "javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9849) Leaf queues not inheriting parent queue status after adding status as “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml
Sushanta Sen created YARN-9849: -- Summary: Leaf queues not inheriting parent queue status after adding status as “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml Key: YARN-9849 URL: https://issues.apache.org/jira/browse/YARN-9849 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler Reporter: Sushanta Sen 【Precondition】: 1. Install the cluster 2. Config Queues with more numbers say 2 parent [default,q1] & 4 leaf [q2,q3] 3. Cluster should be up and running 【Test step】: 1.By default leaf quques inherit parent status 2.Change leaf queues status as "RUNNING" explicitly 3. Run refresh command, leaf queues status shown as "RUNNING" in CLI/UI 4. Therafter,change the leaft queues status to "STOPPED" and refresh command 5. Run refresh command, leaf queues status shown as "STOPPING" in CLI/UI 6. Now comment the leafy queues status and run refresh queues 7.Observe 【Expect Output】: The leaf queues status should be displayed as "RUNNING" inheriting from the parent queue. 【Actual Output】: Still displays the leaf queues status as "STOPPED" raather than inheriting the same from parent which is in RUNNING -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org