[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803659#comment-16803659 ] Tao Yang commented on YARN-9413: {quote} Can you submit the patch to trigger jenkins job? {quote} Sure, already triggered. Thanks [~cheersyang] for the review. > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803694#comment-16803694 ] Hadoop QA commented on YARN-9413: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 27s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 39s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 39s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 39s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 34s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 163 unchanged - 1 fixed = 164 total (was 164) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 56s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 30s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 41s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 41m 53s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9413 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12963827/YARN-9413.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8a5d1a658b61 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8a59efe | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/23822/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | compile | https://builds.apache.org/job/PreCommit-YARN-Build/23822/artifact/out/patch-co
[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803726#comment-16803726 ] Hadoop QA commented on YARN-8200: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 73 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 53s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 19s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 23s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 26s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 6s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 12m 10s{color} | {color:green} branch-2 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 9m 11s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 7m 51s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 37s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 22s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m 22s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 22s{color} | {color:red} root-jdk1.7.0_95 with JDK v1.7.0_95 generated 2 new + 1441 unchanged - 2 fixed = 1443 total (was 1443) {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 26s{color} | {color:green} the patch passed with JDK v1.8.0_191 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 21s{color} | {color:orange} root: The patch generated 194 new + 3977 unchanged - 73 fixed = 4171 total (was 4050) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 10m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 12s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 524 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 17s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{c
[jira] [Updated] (YARN-9270) Minor cleanup in TestFpgaDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9270: --- Attachment: YARN-9270-004.patch > Minor cleanup in TestFpgaDiscoverer > --- > > Key: YARN-9270 > URL: https://issues.apache.org/jira/browse/YARN-9270 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9270-001.patch, YARN-9270-002.patch, > YARN-9270-003.patch, YARN-9270-004.patch > > > Let's do some cleanup in this class. > * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split > up to 5 different tests, because it tests 5 different scenarios. > * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a > {{Function}} in the plugin class like {{Function envProvider > = System::getenv()}} plus a setter method which allows the test to modify > {{envProvider}}. Much simpler and straightfoward. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803847#comment-16803847 ] Hadoop QA commented on YARN-9270: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 9 new + 34 unchanged - 5 fixed = 43 total (was 39) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 73m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9270 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964003/YARN-9270-004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 95f3acbcf759 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8a59efe | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/23823/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23823/testReport/ | | Max. process+thread count | 308 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nod
[jira] [Updated] (YARN-9270) Minor cleanup in TestFpgaDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9270: --- Attachment: YARN-9270-005.patch > Minor cleanup in TestFpgaDiscoverer > --- > > Key: YARN-9270 > URL: https://issues.apache.org/jira/browse/YARN-9270 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9270-001.patch, YARN-9270-002.patch, > YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch > > > Let's do some cleanup in this class. > * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split > up to 5 different tests, because it tests 5 different scenarios. > * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a > {{Function}} in the plugin class like {{Function envProvider > = System::getenv()}} plus a setter method which allows the test to modify > {{envProvider}}. Much simpler and straightfoward. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803859#comment-16803859 ] Peter Bacsko commented on YARN-9270: Fixed the remaining checkstyle problems in patch v5. > Minor cleanup in TestFpgaDiscoverer > --- > > Key: YARN-9270 > URL: https://issues.apache.org/jira/browse/YARN-9270 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9270-001.patch, YARN-9270-002.patch, > YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch > > > Let's do some cleanup in this class. > * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split > up to 5 different tests, because it tests 5 different scenarios. > * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a > {{Function}} in the plugin class like {{Function envProvider > = System::getenv()}} plus a setter method which allows the test to modify > {{envProvider}}. Much simpler and straightfoward. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9235) If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown
[ https://issues.apache.org/jira/browse/YARN-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antal Bálint Steinbach updated YARN-9235: - Attachment: YARN-9235.004.patch > If linux container executor is not set for a GPU cluster > GpuResourceHandlerImpl is not initialized and NPE is thrown > > > Key: YARN-9235 > URL: https://issues.apache.org/jira/browse/YARN-9235 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0, 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-9235.001.patch, YARN-9235.002.patch, > YARN-9235.003.patch, YARN-9235.004.patch > > > If GPU plugin is enabled for the NodeManager, it is possible to run jobs with > GPU. > However, if LinuxContainerExecutor is not configured, an NPE is thrown when > calling > {code:java} > GpuResourcePlugin.getNMResourceInfo{code} > Also, there are no warns in the log if GPU is misconfigured like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9414) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-9414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803872#comment-16803872 ] Adam Antal commented on YARN-9414: -- Thanks [~eyang]! > Application Catalog for YARN applications > - > > Key: YARN-9414 > URL: https://issues.apache.org/jira/browse/YARN-9414 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-Application-Catalog.pdf > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9418: Description: ATSV2 entities rest api does not show the metrics {code:java} [hbase@yarn-ats-3 centos]$ curl -s "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; | jq . { "metrics": [], "events": [], "createdtime": 1553695002014, "idprefix": 0, "type": "YARN_CONTAINER", "id": "container_e18_1553685341603_0006_01_01", "info": { "UID": "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", "FROM_ID": "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" }, "configs": {}, "isrelatedto": {}, "relatesto": {} }{code} NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this is not shown in above output. Found NM container entities are set with entityIdPrefix as inverted container starttime whereas RM container entities are set with default 0. TimelineReader fetches only RM container entries. Confirmed with setting NM container entities entityIdPrefix to 0 same as RM (for testing purpose) and found metrics are shown. {code:java} "metrics": [ { "type": "SINGLE_VALUE", "id": "MEMORY", "aggregationOp": "NOP", "values": { "1553774981355": 490430464 } }, { "type": "SINGLE_VALUE", "id": "CPU", "aggregationOp": "NOP", "values": { "1553774981355": 5 } } ]{code} was: ATSV2 entities rest api does not show the metrics {code:java} [hbase@yarn-ats-3 centos]$ curl -s "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; | jq . { "metrics": [], "events": [], "createdtime": 1553695002014, "idprefix": 0, "type": "YARN_CONTAINER", "id": "container_e18_1553685341603_0006_01_01", "info": { "UID": "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", "FROM_ID": "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" }, "configs": {}, "isrelatedto": {}, "relatesto": {} }{code} NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this is not shown in above output. Found NM container entries are updated with right flowRunId (startTime of the job) whereas RM container entries are updated with default 0. TimelineReader fetches only rows which are updated by RM (i.e, rowkeys with flowRunId 0). > ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics > --- > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail
[jira] [Commented] (YARN-9235) If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown
[ https://issues.apache.org/jira/browse/YARN-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803881#comment-16803881 ] Szilard Nemeth commented on YARN-9235: -- Hi [~bsteinbach]! Do you need help with the review? > If linux container executor is not set for a GPU cluster > GpuResourceHandlerImpl is not initialized and NPE is thrown > > > Key: YARN-9235 > URL: https://issues.apache.org/jira/browse/YARN-9235 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0, 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-9235.001.patch, YARN-9235.002.patch, > YARN-9235.003.patch, YARN-9235.004.patch > > > If GPU plugin is enabled for the NodeManager, it is possible to run jobs with > GPU. > However, if LinuxContainerExecutor is not configured, an NPE is thrown when > calling > {code:java} > GpuResourcePlugin.getNMResourceInfo{code} > Also, there are no warns in the log if GPU is misconfigured like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9235) If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown
[ https://issues.apache.org/jira/browse/YARN-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803882#comment-16803882 ] Antal Bálint Steinbach commented on YARN-9235: -- Hi, LOGGER is renamed to LOG. I agree that testing general exceptions need to be checked, but testing its error message is not the way to do that for several reasons. (parameters in the text, i18n, etc..) I do not want to change the exception itself, as this was a small refactor and NPE fix. So I think this small tradeoff is acceptable in the test. [~sunilg], [~tangzhankun] can you please push? > If linux container executor is not set for a GPU cluster > GpuResourceHandlerImpl is not initialized and NPE is thrown > > > Key: YARN-9235 > URL: https://issues.apache.org/jira/browse/YARN-9235 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0, 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-9235.001.patch, YARN-9235.002.patch, > YARN-9235.003.patch, YARN-9235.004.patch > > > If GPU plugin is enabled for the NodeManager, it is possible to run jobs with > GPU. > However, if LinuxContainerExecutor is not configured, an NPE is thrown when > calling > {code:java} > GpuResourcePlugin.getNMResourceInfo{code} > Also, there are no warns in the log if GPU is misconfigured like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9235) If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown
[ https://issues.apache.org/jira/browse/YARN-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803896#comment-16803896 ] Antal Bálint Steinbach commented on YARN-9235: -- [~snemeth] , it already has a +1, thanks > If linux container executor is not set for a GPU cluster > GpuResourceHandlerImpl is not initialized and NPE is thrown > > > Key: YARN-9235 > URL: https://issues.apache.org/jira/browse/YARN-9235 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0, 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Major > Attachments: YARN-9235.001.patch, YARN-9235.002.patch, > YARN-9235.003.patch, YARN-9235.004.patch > > > If GPU plugin is enabled for the NodeManager, it is possible to run jobs with > GPU. > However, if LinuxContainerExecutor is not configured, an NPE is thrown when > calling > {code:java} > GpuResourcePlugin.getNMResourceInfo{code} > Also, there are no warns in the log if GPU is misconfigured like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9423) Optimize AM launcher to avoid bottleneck when a large number of AM failover happen at the same time
Tao Yang created YARN-9423: -- Summary: Optimize AM launcher to avoid bottleneck when a large number of AM failover happen at the same time Key: YARN-9423 URL: https://issues.apache.org/jira/browse/YARN-9423 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 3.2.0 Reporter: Tao Yang Assignee: Tao Yang We have met a slow recovery for applications when many NM lost happen at the same time: # many NM shut down at the same time abnormally. # NM expired, then a large number of AM start failover. # AM containers are allocated but not launched for about half an hour. Among this slow recovery, all ApplicationMasterLauncher threads were calling cleanup for containers on these lost nodes and keep retrying to communicate with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) even though RM had known these NM are lost and probably can't be connected for a long time. Meanwhile many AM cleanup and launch operations were still waiting in queue (ApplicationMasterLauncher#masterEvents). Obviously AM launch operations were blocked by cleanup operations which are wasting 3 minutes. As a result, AM failover can be a very slow journey. I think we can optimize AM launcher in two ways: # Modify type of ApplicationMasterLauncher#masterEvents from LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch operations in front of cleanup operations. # Check node state first and skip cleanup AM containers on non-existent or unusable NM (because these NM probably can't be communicated for a long time) before communicating with NM in cleanup process(AMLauncher#cleanup). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9423) Optimize AM launcher to avoid bottleneck when a large number of AM failover happen at the same time
[ https://issues.apache.org/jira/browse/YARN-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9423: --- Description: We have met a slow recovery for applications after many NM lost: # many NM shut down at the same time abnormally. # NM expired, then a large number of AM start failover. # AM containers are allocated but not launched for about half an hour. Among this slow recovery, all ApplicationMasterLauncher threads were calling cleanup for containers on these lost nodes and keep retrying to communicate with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) even though RM had known these NM are lost and probably can't be connected for a long time. Meanwhile many AM cleanup and launch operations were still waiting in queue (ApplicationMasterLauncher#masterEvents). Obviously AM launch operations were blocked by cleanup operations which are wasting 3 minutes. As a result, AM failover can be a very slow journey. I think we can optimize AM launcher in two ways: # Modify type of ApplicationMasterLauncher#masterEvents from LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch operations in front of cleanup operations. # Check node state first and skip cleanup AM containers on non-existent or unusable NM (because these NM probably can't be communicated for a long time) before communicating with NM in cleanup process(AMLauncher#cleanup). was: We have met a slow recovery for applications when many NM lost happen at the same time: # many NM shut down at the same time abnormally. # NM expired, then a large number of AM start failover. # AM containers are allocated but not launched for about half an hour. Among this slow recovery, all ApplicationMasterLauncher threads were calling cleanup for containers on these lost nodes and keep retrying to communicate with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) even though RM had known these NM are lost and probably can't be connected for a long time. Meanwhile many AM cleanup and launch operations were still waiting in queue (ApplicationMasterLauncher#masterEvents). Obviously AM launch operations were blocked by cleanup operations which are wasting 3 minutes. As a result, AM failover can be a very slow journey. I think we can optimize AM launcher in two ways: # Modify type of ApplicationMasterLauncher#masterEvents from LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch operations in front of cleanup operations. # Check node state first and skip cleanup AM containers on non-existent or unusable NM (because these NM probably can't be communicated for a long time) before communicating with NM in cleanup process(AMLauncher#cleanup). > Optimize AM launcher to avoid bottleneck when a large number of AM failover > happen at the same time > --- > > Key: YARN-9423 > URL: https://issues.apache.org/jira/browse/YARN-9423 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > > We have met a slow recovery for applications after many NM lost: > # many NM shut down at the same time abnormally. > # NM expired, then a large number of AM start failover. > # AM containers are allocated but not launched for about half an hour. > Among this slow recovery, all ApplicationMasterLauncher threads were calling > cleanup for containers on these lost nodes and keep retrying to communicate > with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) > even though RM had known these NM are lost and probably can't be connected > for a long time. Meanwhile many AM cleanup and launch operations were still > waiting in queue (ApplicationMasterLauncher#masterEvents). Obviously AM > launch operations were blocked by cleanup operations which are wasting 3 > minutes. As a result, AM failover can be a very slow journey. > I think we can optimize AM launcher in two ways: > # Modify type of ApplicationMasterLauncher#masterEvents from > LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch > operations in front of cleanup operations. > # Check node state first and skip cleanup AM containers on non-existent or > unusable NM (because these NM probably can't be communicated for a long time) > before communicating with NM in cleanup process(AMLauncher#cleanup). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9423) Optimize AM launcher to avoid bottleneck when a large number of AM failover happen at the same time
[ https://issues.apache.org/jira/browse/YARN-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9423: --- Description: We have met a slow recovery for applications after many NM lost: # many NM shut down at the same time abnormally. # NM expired, then a large number of AM start failover. # AM containers were allocated but not launched and last for about half an hour. Among this slow recovery, all ApplicationMasterLauncher threads were calling cleanup for containers on these lost nodes and keep retrying to communicate with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) even though RM had known these NM are lost and probably can't be connected for a long time. Meanwhile many AM cleanup and launch operations were still waiting in queue (ApplicationMasterLauncher#masterEvents). Obviously AM launch operations were blocked by cleanup operations which are wasting 3 minutes. As a result, AM failover can be a very slow journey. I think we can optimize AM launcher in two ways: # Modify type of ApplicationMasterLauncher#masterEvents from LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch operations in front of cleanup operations. # Check node state first and skip cleanup AM containers on non-existent or unusable NM (because these NM probably can't be communicated for a long time) before communicating with NM in cleanup process(AMLauncher#cleanup). was: We have met a slow recovery for applications after many NM lost: # many NM shut down at the same time abnormally. # NM expired, then a large number of AM start failover. # AM containers are allocated but not launched for about half an hour. Among this slow recovery, all ApplicationMasterLauncher threads were calling cleanup for containers on these lost nodes and keep retrying to communicate with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) even though RM had known these NM are lost and probably can't be connected for a long time. Meanwhile many AM cleanup and launch operations were still waiting in queue (ApplicationMasterLauncher#masterEvents). Obviously AM launch operations were blocked by cleanup operations which are wasting 3 minutes. As a result, AM failover can be a very slow journey. I think we can optimize AM launcher in two ways: # Modify type of ApplicationMasterLauncher#masterEvents from LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch operations in front of cleanup operations. # Check node state first and skip cleanup AM containers on non-existent or unusable NM (because these NM probably can't be communicated for a long time) before communicating with NM in cleanup process(AMLauncher#cleanup). > Optimize AM launcher to avoid bottleneck when a large number of AM failover > happen at the same time > --- > > Key: YARN-9423 > URL: https://issues.apache.org/jira/browse/YARN-9423 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.2.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > > We have met a slow recovery for applications after many NM lost: > # many NM shut down at the same time abnormally. > # NM expired, then a large number of AM start failover. > # AM containers were allocated but not launched and last for about half an > hour. > Among this slow recovery, all ApplicationMasterLauncher threads were calling > cleanup for containers on these lost nodes and keep retrying to communicate > with NM for 3 minutes(retry policy is configured in NMProxy#createNMProxy) > even though RM had known these NM are lost and probably can't be connected > for a long time. Meanwhile many AM cleanup and launch operations were still > waiting in queue (ApplicationMasterLauncher#masterEvents). Obviously AM > launch operations were blocked by cleanup operations which are wasting 3 > minutes. As a result, AM failover can be a very slow journey. > I think we can optimize AM launcher in two ways: > # Modify type of ApplicationMasterLauncher#masterEvents from > LinkedBlockingQueue to PriorityBlockingQueue, keep executing launch > operations in front of cleanup operations. > # Check node state first and skip cleanup AM containers on non-existent or > unusable NM (because these NM probably can't be communicated for a long time) > before communicating with NM in cleanup process(AMLauncher#cleanup). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9413: --- Attachment: YARN-9413.002.patch > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch, YARN-9413.002.patch > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803905#comment-16803905 ] Tao Yang commented on YARN-9413: Attached v2 patch to fix compile error and update name of new test case from testQueueResourceLeakForCapacityScheduler to testQueueResourceDoesNotLeak. > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch, YARN-9413.002.patch > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803906#comment-16803906 ] Hadoop QA commented on YARN-9270: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 33 unchanged - 5 fixed = 33 total (was 38) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 46s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 35s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9270 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964022/YARN-9270-005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 25ae00ca6dc6 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 15d38b1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23824/testReport/ | | Max. process+thread count | 447 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23824/console | | Powered by | A
[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803913#comment-16803913 ] Szilard Nemeth commented on YARN-9214: -- Hi [~jiwq]! I can help review this. Could you please rebase the patch onto trunk? There's a merge conflict. Thanks! > Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code > -- > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Deleted] (YARN-9422) Simon poortman
[ https://issues.apache.org/jira/browse/YARN-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran deleted YARN-9422: - > Simon poortman > -- > > Key: YARN-9422 > URL: https://issues.apache.org/jira/browse/YARN-9422 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Simon Poortman >Priority: Major > Labels: Beste > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9235) If linux container executor is not set for a GPU cluster GpuResourceHandlerImpl is not initialized and NPE is thrown
[ https://issues.apache.org/jira/browse/YARN-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803934#comment-16803934 ] Hadoop QA commented on YARN-9235: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 47s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9235 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964024/YARN-9235.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 92a3073ea089 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 15d38b1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/23825/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23825/testReport/ | | Max. process+thread count | 446 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Updated] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wanqiang Ji updated YARN-9214: -- Attachment: YARN-9214.002.patch > Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code > -- > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch, YARN-9214.002.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9418: Attachment: YARN-9418-001.patch > ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics > --- > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Bug > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9418-001.patch > > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9418: Issue Type: Sub-task (was: Bug) Parent: YARN-7055 > ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics > --- > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9418-001.patch > > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804047#comment-16804047 ] Hadoop QA commented on YARN-9413: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 163 unchanged - 1 fixed = 164 total (was 164) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 76m 0s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}121m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9413 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964030/YARN-9413.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c0634f52bf01 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 15d38b1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/23826/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23826/testReport/ | | Max. process+thread count | 926 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-se
[jira] [Commented] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804054#comment-16804054 ] Prabhu Joseph commented on YARN-9418: - Have used inverted ContainerId#getContainerId as entityIdPrefix for YARN_CONTAINER entities for both NM and RM. TimelineReader fetches ContainerEntity from both NM and RM. {code:java} [hbase@yarn-ats-3 BACKUP]$ curl -s "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553788280931_0001/entities/YARN_CONTAINER/container_e30_1553788280931_0001_01_01?user.name=hbase&fields=ALL"; | jq . { "metrics": [ { "type": "SINGLE_VALUE", "id": "CPU", "aggregationOp": "NOP", "values": { "1553788338798": 2 } }, { "type": "SINGLE_VALUE", "id": "MEMORY", "aggregationOp": "NOP", "values": { "1553788338798": 488013824 } } ], "events": [ { "id": "YARN_RM_CONTAINER_FINISHED", "timestamp": 1553788341158, "info": {} }, { "id": "YARN_CONTAINER_FINISHED", "timestamp": 1553788341151, "info": {} }, { "id": "YARN_NM_CONTAINER_LOCALIZATION_FINISHED", "timestamp": 1553788316027, "info": {} }, { "id": "YARN_NM_CONTAINER_LOCALIZATION_STARTED", "timestamp": 1553788315294, "info": {} }, { "id": "YARN_CONTAINER_CREATED", "timestamp": 1553788315284, "info": {} }, { "id": "YARN_RM_CONTAINER_CREATED", "timestamp": 1553788314339, "info": {} } ], "createdtime": 1553788314809, "idprefix": 9223339051505943000, "info": { "YARN_CONTAINER_STATE": "COMPLETE", "YARN_CONTAINER_ALLOCATED_HOST": "yarn-ats-1", "YARN_CONTAINER_ALLOCATED_HOST_HTTP_ADDRESS": "yarn-ats-1:8042", "YARN_CONTAINER_ALLOCATED_VCORE": 1, "FROM_ID": "ats!hbase!word count!1553788313450!application_1553788280931_0001!YARN_CONTAINER!9223339051505942526!container_e30_1553788280931_0001_01_01", "YARN_CONTAINER_ALLOCATED_PORT": 45454, "UID": "ats!application_1553788280931_0001!YARN_CONTAINER!9223339051505942526!container_e30_1553788280931_0001_01_01", "YARN_CONTAINER_ALLOCATED_MEMORY": 2048, "SYSTEM_INFO_PARENT_ENTITY": { "type": "YARN_APPLICATION_ATTEMPT", "id": "appattempt_1553788280931_0001_01" }, "YARN_CONTAINER_EXIT_STATUS": 0, "YARN_CONTAINER_ALLOCATED_PRIORITY": "0", "YARN_CONTAINER_DIAGNOSTICS_INFO": "", "YARN_CONTAINER_FINISHED_TIME": 1553788341151 }, "configs": {}, "isrelatedto": {}, "relatesto": {}, "id": "container_e30_1553788280931_0001_01_01", "type": "YARN_CONTAINER" }{code} > ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics > --- > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9418-001.patch > > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804056#comment-16804056 ] Weiwei Yang commented on YARN-9413: --- LGTM, +1 once the remaining checkstyle issue is fixed. [~Tao Yang], could you pls fix that? > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch, YARN-9413.002.patch > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9348: Attachment: YARN-9348.009.patch > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch, > YARN-9348.009.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9348: Attachment: (was: YARN-9348.009.patch) > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9348: Attachment: YARN-9348.009.patch > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch, > YARN-9348.009.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9318) Resources#multiplyAndRoundUp does not consider Resource Types
[ https://issues.apache.org/jira/browse/YARN-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804075#comment-16804075 ] Szilard Nemeth commented on YARN-9318: -- Hi [~sunilg]! Do you think branh-3.2 / branch-3.1 patches are needed? If so, I will upload them. Thanks! > Resources#multiplyAndRoundUp does not consider Resource Types > - > > Key: YARN-9318 > URL: https://issues.apache.org/jira/browse/YARN-9318 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9318.001.patch, YARN-9318.002.patch, > YARN-9318.003.patch, YARN-9318.004.patch, YARN-9318.005.patch > > > org.apache.hadoop.yarn.util.resource.Resources#multiplyAndRoundUp only deals > with memory and vcores while computing the rounded value. It should also > consider custom Resource Types as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9322) Store metrics for custom resource types into FSQueueMetrics and query them in FairSchedulerQueueInfo
[ https://issues.apache.org/jira/browse/YARN-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804077#comment-16804077 ] Szilard Nemeth commented on YARN-9322: -- Hi [~sunilg]! Do you think branh-3.2 / branch-3.1 patches are needed? If so, I will upload them. Thanks! > Store metrics for custom resource types into FSQueueMetrics and query them in > FairSchedulerQueueInfo > > > Key: YARN-9322 > URL: https://issues.apache.org/jira/browse/YARN-9322 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.3.0 > > Attachments: Screen Shot 2019-02-21 at 12.06.46.png, > YARN-9322.001.patch, YARN-9322.002.patch, YARN-9322.003.patch, > YARN-9322.004.patch, YARN-9322.005.patch, YARN-9322.006.patch > > > YARN-8842 implemented storing and exposing of metrics of custom resources. > FSQueueMetrics should have a similar implementation. > All metrics stored in this class should have their custom resource > counterpart. > In a consequence of metrics were not stored for custom resource type, > FairSchedulerQueueInfo haven't contained those values therefore the UI v1 > could not show them, obviously. > See that gpu is missing from the value of "AM Max Resources" on the attached > screenshot. > Additionally, the callees of the following methods (in class > FairSchedulerQueueInfo) should consider to query values for custom resource > types too: > getMaxAMShareMB > getMaxAMShareVCores > getAMResourceUsageMB > getAMResourceUsageVCores -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9323) FSLeafQueue#computeMaxAMResource does not override zero values for custom resources
[ https://issues.apache.org/jira/browse/YARN-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804081#comment-16804081 ] Szilard Nemeth commented on YARN-9323: -- Hi [~sunilg]! Do you think branh-3.2 / branch-3.1 patches are needed? If so, I will upload them. Thanks! > FSLeafQueue#computeMaxAMResource does not override zero values for custom > resources > --- > > Key: YARN-9323 > URL: https://issues.apache.org/jira/browse/YARN-9323 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9323.001.patch, YARN-9323.002.patch, > YARN-9323.003.patch, YARN-9323.004.patch, YARN-9323.005.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9348: Attachment: YARN-9348.010.patch > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch, > YARN-9348.009.patch, YARN-9348.010.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804095#comment-16804095 ] Eric Yang commented on YARN-9348: - Patch 9 is broken during patch generation. Patch 10 is YARN-7129 patch 035 + YARN-9348 patch 08 combined to work around a Yetus bug that it doesn't compute findbugs report correctly for newly added submodule (YETUS-825). This ensures the precommit build test changes in YARN-9348 patch 08. When commiting, YARN-7129 and YARN-9348 will commit separately to track the requested changes. > Build issues on hadoop-yarn-application-catalog-webapp > -- > > Key: YARN-9348 > URL: https://issues.apache.org/jira/browse/YARN-9348 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9348.001.patch, YARN-9348.002.patch, > YARN-9348.003.patch, YARN-9348.004.patch, YARN-9348.005.patch, > YARN-9348.006.patch, YARN-9348.007.patch, YARN-9348.008.patch, > YARN-9348.009.patch, YARN-9348.010.patch > > > A couple reports jenkins precommit builds are failing due to integration > problem between nodejs libraries and Yetus. Problems are: > # Nodejs third party libraries are checked by whitespace check, which > generates many errors. One possible solution is to move nodejs libraries > placement from project top level directory to target directory to prevent > stumble on whitespace checks. > # maven clean fails because clean plugin tries to remove target directory and > files inside target/generated-sources directories to cause race conditions. > # Building on mac will trigger access to osx keychain to attempt to login to > Dockerhub. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804215#comment-16804215 ] Hadoop QA commented on YARN-9214: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 55s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}124m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9214 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964045/YARN-9214.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5a7c17f28237 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / df578c0 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23827/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23827/testReport/ | | Max. process+thread count | 899 (vs. ulimit of 1) | | mod
[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804244#comment-16804244 ] Szilard Nemeth commented on YARN-9214: -- Hi [~jiwq]! This is a nice refactor, in overall! Here are my comments: # Imports changes are confusing, I guess you Organized imports with your IDE. Could you please revert those changes and only add the new imports to the bottom? # The name of the extracted method "getValidQueues" is misleading: The method is not about getting valid queues, it returns apps belong to the specified queue. I would rename it to getAppsFromQueue() or something like that. # I know you just extracted the method, but I think it's okay to modify the error message a bit to: "The specified queue: " + queueName + " does not exist!" (queue with lowercase Q in the beginning, does not instead of doesn't) Apart from these, I'm okay with the patch! Thanks! > Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code > -- > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch, YARN-9214.002.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804280#comment-16804280 ] Hadoop QA commented on YARN-9418: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 52s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}154m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9418 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964046/YARN-9418-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ad21946ee925 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | g
[jira] [Commented] (YARN-9227) DistributedShell RelativePath is not removed at end
[ https://issues.apache.org/jira/browse/YARN-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804284#comment-16804284 ] Szilard Nemeth commented on YARN-9227: -- Hi [~Prabhu Joseph]! Some comments: In the method "testDistributedShellCleanup": # Result boolean variable is not used, can be removed. # There's a while loop in the end: How can you be sure that it won't run infinitely? # You don't need the continue statement at the end of the while-loop # In the assertion, the condition is negated ( {{!fs1.exists(path)}}). Use assertFalse instead, without negating the fs1.exists(path) call. # In the assertion, I would replace "Fails" with "failed". Thanks! > DistributedShell RelativePath is not removed at end > --- > > Key: YARN-9227 > URL: https://issues.apache.org/jira/browse/YARN-9227 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: 0001-YARN-9227.patch, 0002-YARN-9227.patch, > 0003-YARN-9227.patch > > > DistributedShell Job does not remove the relative path which contains jars > and localized files. > {code} > [ambari-qa@ash hadoop-yarn]$ hadoop fs -ls > /user/ambari-qa/DistributedShell/application_1542665708563_0017 > Found 2 items > -rw-r--r-- 3 ambari-qa hdfs 46636 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/AppMaster.jar > -rwx--x--- 3 ambari-qa hdfs 4 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/shellCommands > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804298#comment-16804298 ] Szilard Nemeth commented on YARN-8943: -- Hi [~ajisakaa]! I reviewed your changes, +1 (non-binding). Next time, please avoid to change things not strictly related to the patch, e.g.: * Changing method visibilities * Reformatting the code * Any other unnecessary change These can be solved in a follow-up jira, but in this case, just makes the review process harder. This is especially true if you introduce junit5 into some bigger Maven modules than the hadoop-yarn-api project (I saw you have this in YARN-6946). I do note that the order of the parameters needs to be changed for the assertXX methods, as the message string is became the last parameter in junit5 as opposed to junit4. Btw, after this is merged, are you planning to proceed with YARN-6946? As I said, if you include the changes absolutely required to introduce junit5 into that project, the more likely it's easier to review. A second thought, just out of curiosity: Did you use some junit4 to junit5 migration script or did you need to adjust the order of the parameters by hand? Thanks! > Upgrade JUnit from 4 to 5 in hadoop-yarn-api > > > Key: YARN-8943 > URL: https://issues.apache.org/jira/browse/YARN-8943 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Attachments: YARN-8943.01.patch, YARN-8943.02.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9418) ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics
[ https://issues.apache.org/jira/browse/YARN-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804301#comment-16804301 ] Giovanni Matteo Fumarola commented on YARN-9418: Thanks [~Prabhu Joseph] for the patch. I think the failed test is related. > ATSV2 /apps/${appId}/entities/YARN_CONTAINER rest api does not show metrics > --- > > Key: YARN-9418 > URL: https://issues.apache.org/jira/browse/YARN-9418 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9418-001.patch > > > ATSV2 entities rest api does not show the metrics > {code:java} > [hbase@yarn-ats-3 centos]$ curl -s > "http://yarn-ats-3:8198/ws/v2/timeline/apps/application_1553685341603_0006/entities/YARN_CONTAINER/container_e18_1553685341603_0006_01_01?user.name=hbase&fields=METRICS"; > | jq . > { > "metrics": [], > "events": [], > "createdtime": 1553695002014, > "idprefix": 0, > "type": "YARN_CONTAINER", > "id": "container_e18_1553685341603_0006_01_01", > "info": { > "UID": > "ats!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01", > "FROM_ID": > "ats!hbase!QuasiMonteCarlo!1553695001394!application_1553685341603_0006!YARN_CONTAINER!0!container_e18_1553685341603_0006_01_01" > }, > "configs": {}, > "isrelatedto": {}, > "relatesto": {} > }{code} > NodeManager puts YARN_CONTAINER entities with CPU and MEMORY metrics but this > is not shown in above output. Found NM container entities are set with > entityIdPrefix as inverted container starttime whereas RM container entities > are set with default 0. TimelineReader fetches only RM container entries. > Confirmed with setting NM container entities entityIdPrefix to 0 same as RM > (for testing purpose) and found metrics are shown. > {code:java} > "metrics": [ > { > "type": "SINGLE_VALUE", > "id": "MEMORY", > "aggregationOp": "NOP", > "values": { > "1553774981355": 490430464 > } > }, > { > "type": "SINGLE_VALUE", > "id": "CPU", > "aggregationOp": "NOP", > "values": { > "1553774981355": 5 > } > } > ]{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804308#comment-16804308 ] Szilard Nemeth commented on YARN-9413: -- Hi [~Tao Yang]! Apart from the checkstyle issue, I found some others: 1. Could you please use javadoc instead of simple comments to document org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart#testQueueResourceDoesNotLeak ? 2. Instead of calling getConf(), can you use the field "conf" directly? I can see calling getConf() is the way how other tests are working as well, but I feel it a bit weird. 3. There's this call: {code:java} Assert.assertTrue(!attempt1.shouldCountTowardsMaxAttemptRetry()); {code} You should change it to Assert.assertFalse, so that you don't need to use negation! > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch, YARN-9413.002.patch > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804312#comment-16804312 ] Akira Ajisaka commented on YARN-8943: - Thanks [~snemeth] for the review! bq. Next time, please avoid to change things not strictly related to the patch I got it. I'll solve these things in the follow-up jiras. bq. Btw, after this is merged, are you planning to proceed with YARN-6946? Yes, I'd like to migrate per module. bq. Did you use some junit4 to junit5 migration script or did you need to adjust the order of the parameters by hand? I did the change by hand. In the next jira, I'd like to write a script for the migration. > Upgrade JUnit from 4 to 5 in hadoop-yarn-api > > > Key: YARN-8943 > URL: https://issues.apache.org/jira/browse/YARN-8943 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Attachments: YARN-8943.01.patch, YARN-8943.02.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804317#comment-16804317 ] Szilard Nemeth commented on YARN-8943: -- Or maybe we could use this one before doing the migration from junit4 to juni5 on other projects? [http://joel-costigliola.github.io/assertj/assertj-core-converting-junit-assertions-to-assertj.html] AssertJ has much cleaner / better syntax and produces more readable error messages, anyways. > Upgrade JUnit from 4 to 5 in hadoop-yarn-api > > > Key: YARN-8943 > URL: https://issues.apache.org/jira/browse/YARN-8943 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Attachments: YARN-8943.01.patch, YARN-8943.02.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804317#comment-16804317 ] Szilard Nemeth edited comment on YARN-8943 at 3/28/19 9:26 PM: --- Or maybe we could use this one before doing the migration from junit4 to juni5 on other projects? [http://joel-costigliola.github.io/assertj/assertj-core-converting-junit-assertions-to-assertj.html] AssertJ has much cleaner / better syntax and produces more readable error messages, anyways. Regarding your current patch: If you need help finding committers, I think [~sunilg] or [~tangzhankun] can help as we have a +1 already from me. Thanks! was (Author: snemeth): Or maybe we could use this one before doing the migration from junit4 to juni5 on other projects? [http://joel-costigliola.github.io/assertj/assertj-core-converting-junit-assertions-to-assertj.html] AssertJ has much cleaner / better syntax and produces more readable error messages, anyways. > Upgrade JUnit from 4 to 5 in hadoop-yarn-api > > > Key: YARN-8943 > URL: https://issues.apache.org/jira/browse/YARN-8943 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Attachments: YARN-8943.01.patch, YARN-8943.02.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804319#comment-16804319 ] Szilard Nemeth commented on YARN-9270: -- Hi [~pbacsko]! Do you need review on the latest patch? > Minor cleanup in TestFpgaDiscoverer > --- > > Key: YARN-9270 > URL: https://issues.apache.org/jira/browse/YARN-9270 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9270-001.patch, YARN-9270-002.patch, > YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch > > > Let's do some cleanup in this class. > * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split > up to 5 different tests, because it tests 5 different scenarios. > * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a > {{Function}} in the plugin class like {{Function envProvider > = System::getenv()}} plus a setter method which allows the test to modify > {{envProvider}}. Much simpler and straightfoward. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804320#comment-16804320 ] Peter Bacsko commented on YARN-9270: [~snemeth] I'd appreciate comments if you have any. > Minor cleanup in TestFpgaDiscoverer > --- > > Key: YARN-9270 > URL: https://issues.apache.org/jira/browse/YARN-9270 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9270-001.patch, YARN-9270-002.patch, > YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch > > > Let's do some cleanup in this class. > * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split > up to 5 different tests, because it tests 5 different scenarios. > * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a > {{Function}} in the plugin class like {{Function envProvider > = System::getenv()}} plus a setter method which allows the test to modify > {{envProvider}}. Much simpler and straightfoward. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7505) RM REST endpoints generate malformed JSON
[ https://issues.apache.org/jira/browse/YARN-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-7505: Assignee: Szilard Nemeth (was: Daniel Templeton) > RM REST endpoints generate malformed JSON > - > > Key: YARN-7505 > URL: https://issues.apache.org/jira/browse/YARN-7505 > Project: Hadoop YARN > Issue Type: Bug > Components: restapi >Affects Versions: 3.0.0 >Reporter: Daniel Templeton >Assignee: Szilard Nemeth >Priority: Critical > Attachments: YARN-7505.001.patch, YARN-7505.002.patch > > > For all endpoints that return DAOs that contain maps, the generated JSON is > malformed. For example: > % curl 'http://localhost:8088/ws/v1/cluster/apps' > {"apps":{"app":[{"id":"application_1510777276702_0001","user":"daniel","name":"QuasiMonteCarlo","queue":"root.daniel","state":"RUNNING","finalStatus":"UNDEFINED","progress":5.0,"trackingUI":"ApplicationMaster","trackingUrl":"http://dhcp-10-16-0-181.pa.cloudera.com:8088/proxy/application_1510777276702_0001/","diagnostics":"","clusterId":1510777276702,"applicationType":"MAPREDUCE","applicationTags":"","priority":0,"startedTime":1510777317853,"finishedTime":0,"elapsedTime":21623,"amContainerLogs":"http://dhcp-10-16-0-181.pa.cloudera.com:8042/node/containerlogs/container_1510777276702_0001_01_01/daniel","amHostHttpAddress":"dhcp-10-16-0-181.pa.cloudera.com:8042","amRPCAddress":"dhcp-10-16-0-181.pa.cloudera.com:63371","allocatedMB":5120,"allocatedVCores":4,"reservedMB":0,"reservedVCores":0,"runningContainers":4,"memorySeconds":49820,"vcoreSeconds":26,"queueUsagePercentage":62.5,"clusterUsagePercentage":62.5,"resourceSecondsMap":{"entry":{"key":"test2","value":"0"},"entry":{"key":"test","value":"0"},"entry":{"key":"memory-mb","value":"49820"},"entry":{"key":"vcores","value":"26"}},"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0,"preemptedMemorySeconds":0,"preemptedVcoreSeconds":0,"preemptedResourceSecondsMap":{},"resourceRequests":[{"priority":20,"resourceName":"dhcp-10-16-0-181.pa.cloudera.com","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"/default-rack","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false},{"priority":20,"resourceName":"*","capability":{"memory":1024,"vCores":1},"numContainers":8,"relaxLocality":true,"nodeLabelExpression":"","executionTypeRequest":{"executionType":"GUARANTEED","enforceExecutionType":true},"enforceExecutionType":false}],"logAggregationStatus":"DISABLED","unmanagedApplication":false,"amNodeLabelExpression":"","timeouts":{"timeout":[{"type":"LIFETIME","expiryTime":"UNLIMITED","remainingTimeInSeconds":-1}]}}]}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804340#comment-16804340 ] Szilard Nemeth commented on YARN-9270: -- Hi [~pbacsko]! Went through the changes, +1 (non-binding). This cleanup in the prod code and especially in the test code is pretty neat! > Minor cleanup in TestFpgaDiscoverer > --- > > Key: YARN-9270 > URL: https://issues.apache.org/jira/browse/YARN-9270 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9270-001.patch, YARN-9270-002.patch, > YARN-9270-003.patch, YARN-9270-004.patch, YARN-9270-005.patch > > > Let's do some cleanup in this class. > * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split > up to 5 different tests, because it tests 5 different scenarios. > * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a > {{Function}} in the plugin class like {{Function envProvider > = System::getenv()}} plus a setter method which allows the test to modify > {{envProvider}}. Much simpler and straightfoward. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804345#comment-16804345 ] Szilard Nemeth commented on YARN-8701: -- Hi [~Sen Zhao]! Patch no longer applies! Could you please upload a new patch? Thanks! > If the single parameter in Resources#createResourceWithSameValue is greater > than Integer.MAX_VALUE, then the value of vcores will be -1 > --- > > Key: YARN-8701 > URL: https://issues.apache.org/jira/browse/YARN-8701 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Reporter: Sen Zhao >Assignee: Sen Zhao >Priority: Major > Attachments: YARN-8701.001.patch, YARN-8701.002.patch > > > If I configure *MaxResources* in fair-scheduler.xml, like this: > {code}resource1=50{code} > In the queue, the *MaxResources* value will change to > {code}Max Resources: {code} > I think the value of VCores should be *CLUSTER_VCORES*. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8503) Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING
[ https://issues.apache.org/jira/browse/YARN-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804347#comment-16804347 ] Szilard Nemeth commented on YARN-8503: -- Hi [~AmiyaChakraborty]! The patch file you uploaded is not a real patch file, but it is in HTML format. Can you please update a valid one? Thanks! > Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING > --- > > Key: YARN-8503 > URL: https://issues.apache.org/jira/browse/YARN-8503 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.7.2 >Reporter: Amiya Chakraborty >Assignee: Amiya Chakraborty >Priority: Major > Labels: patch-available, yarn > Attachments: YARN-8503.001.patch, YARN-8503.001.patch > > > Currently, there is no unit test for testing the functionality - > FINISHED_CONTAINERS_PULLED_BY_AM event while Decommissioning of node. This > patch provides the same to check the AM has pulled the containers from the > RM; then the RM will inform the NM about it and the NM can remove the > completed container from its list during DECOMMISSIONING. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9196) Attempt started time zone and Application started time zone is different when OS time zone is modified
[ https://issues.apache.org/jira/browse/YARN-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9196: - Description: In RM application page, attempt start time is formatted client side (browser), but application start time is formatted by the server. If client time zone and server time zone is different then on the UI, the application start time and attempt start time will be in different format. was: In RM application page, attempt start time is formated client side(browser), but aplication start time is formated by the server. If client time zone and server time zone is different then in UI application start time and attempt start time will be different format. > Attempt started time zone and Application started time zone is different when > OS time zone is modified > -- > > Key: YARN-9196 > URL: https://issues.apache.org/jira/browse/YARN-9196 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-9196-001.patch > > > In RM application page, attempt start time is formatted client side > (browser), but application start time is formatted by the server. > If client time zone and server time zone is different then on the UI, the > application start time and attempt start time will be in different format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9196) Attempt started time zone and Application started time zone is different when OS time zone is modified
[ https://issues.apache.org/jira/browse/YARN-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804350#comment-16804350 ] Szilard Nemeth commented on YARN-9196: -- Hi [~BilwaST]! This seems to be a nice fix! Can you update a new patch? The patch does not apply to trunk! > Attempt started time zone and Application started time zone is different when > OS time zone is modified > -- > > Key: YARN-9196 > URL: https://issues.apache.org/jira/browse/YARN-9196 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-9196-001.patch > > > In RM application page, attempt start time is formatted client side > (browser), but application start time is formatted by the server. > If client time zone and server time zone is different then on the UI, the > application start time and attempt start time will be in different format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8503) Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING
[ https://issues.apache.org/jira/browse/YARN-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804357#comment-16804357 ] Hadoop QA commented on YARN-8503: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-8503 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8503 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12942893/YARN-8503.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23833/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Add unit test for FINISHED_CONTAINERS_PULLED_BY_AM event on DECOMMISSIONING > --- > > Key: YARN-8503 > URL: https://issues.apache.org/jira/browse/YARN-8503 > Project: Hadoop YARN > Issue Type: Test > Components: test >Affects Versions: 2.7.2 >Reporter: Amiya Chakraborty >Assignee: Amiya Chakraborty >Priority: Major > Labels: patch-available, yarn > Attachments: YARN-8503.001.patch, YARN-8503.001.patch > > > Currently, there is no unit test for testing the functionality - > FINISHED_CONTAINERS_PULLED_BY_AM event while Decommissioning of node. This > patch provides the same to check the AM has pulled the containers from the > RM; then the RM will inform the NM about it and the NM can remove the > completed container from its list during DECOMMISSIONING. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8701) If the single parameter in Resources#createResourceWithSameValue is greater than Integer.MAX_VALUE, then the value of vcores will be -1
[ https://issues.apache.org/jira/browse/YARN-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804355#comment-16804355 ] Hadoop QA commented on YARN-8701: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} YARN-8701 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8701 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12947055/YARN-8701.002.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23832/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > If the single parameter in Resources#createResourceWithSameValue is greater > than Integer.MAX_VALUE, then the value of vcores will be -1 > --- > > Key: YARN-8701 > URL: https://issues.apache.org/jira/browse/YARN-8701 > Project: Hadoop YARN > Issue Type: Bug > Components: api >Reporter: Sen Zhao >Assignee: Sen Zhao >Priority: Major > Attachments: YARN-8701.001.patch, YARN-8701.002.patch > > > If I configure *MaxResources* in fair-scheduler.xml, like this: > {code}resource1=50{code} > In the queue, the *MaxResources* value will change to > {code}Max Resources: {code} > I think the value of VCores should be *CLUSTER_VCORES*. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8943) Upgrade JUnit from 4 to 5 in hadoop-yarn-api
[ https://issues.apache.org/jira/browse/YARN-8943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804463#comment-16804463 ] Akira Ajisaka commented on YARN-8943: - Thanks [~snemeth] for the information. bq. Or maybe we could use this one before doing the migration from junit4 to juni5 on other projects? Personally, I'm interested in using AssertJ, but it should be discussed in the dev mailing lists because this will change all the test code. > Upgrade JUnit from 4 to 5 in hadoop-yarn-api > > > Key: YARN-8943 > URL: https://issues.apache.org/jira/browse/YARN-8943 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Attachments: YARN-8943.01.patch, YARN-8943.02.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7505) RM REST endpoints generate malformed JSON
[ https://issues.apache.org/jira/browse/YARN-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804491#comment-16804491 ] Hadoop QA commented on YARN-7505: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 51s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 46s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 19s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 32 unchanged - 2 fixed = 33 total (was 34) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 17s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 51s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 26s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}163m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMHA | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-7505 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12897909/YARN-7505.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname
[jira] [Created] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()
Shen Yinjie created YARN-9424: - Summary: Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent() Key: YARN-9424 URL: https://issues.apache.org/jira/browse/YARN-9424 Project: Hadoop YARN Issue Type: Bug Reporter: Shen Yinjie In YARN-8699, FederationClientInterceptor#invokeConcurrent uses getDeclaredMethods(), which cannot recongnize some methods in ApplicationBaseProtocol (ApplicationClientProtocol extend ApplicationBaseProtocol) ,for example getApplications, when I run "yarn application -list" by connecting to yarn router, it will throw exception. So change getDeclaredMethods to getMethods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()
[ https://issues.apache.org/jira/browse/YARN-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie updated YARN-9424: -- Description: In YARN-8699, FederationClientInterceptor#invokeConcurrent uses getDeclaredMethods(), which cannot recongnize some methods in ApplicationBaseProtocol (ApplicationClientProtocol extend ApplicationBaseProtocol) ,for example getApplications(), when I run "yarn application -list" by connecting to yarn router, router will throw exception. So change getDeclaredMethods to getMethods. was: In YARN-8699, FederationClientInterceptor#invokeConcurrent uses getDeclaredMethods(), which cannot recongnize some methods in ApplicationBaseProtocol (ApplicationClientProtocol extend ApplicationBaseProtocol) ,for example getApplications, when I run "yarn application -list" by connecting to yarn router, it will throw exception. So change getDeclaredMethods to getMethods. > Change getDeclaredMethods to getMethods in > FederationClientInterceptor#invokeConcurrent() > - > > Key: YARN-9424 > URL: https://issues.apache.org/jira/browse/YARN-9424 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shen Yinjie >Priority: Major > > In YARN-8699, FederationClientInterceptor#invokeConcurrent uses > getDeclaredMethods(), which cannot recongnize some methods in > ApplicationBaseProtocol (ApplicationClientProtocol extend > ApplicationBaseProtocol) ,for example getApplications(), when I run "yarn > application -list" by connecting to yarn router, router will throw exception. > So change getDeclaredMethods to getMethods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()
[ https://issues.apache.org/jira/browse/YARN-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie updated YARN-9424: -- Attachment: YARN-9124_1.patch > Change getDeclaredMethods to getMethods in > FederationClientInterceptor#invokeConcurrent() > - > > Key: YARN-9424 > URL: https://issues.apache.org/jira/browse/YARN-9424 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shen Yinjie >Priority: Major > Attachments: YARN-9124_1.patch > > > In YARN-8699, FederationClientInterceptor#invokeConcurrent uses > getDeclaredMethods(), which cannot recongnize some methods in > ApplicationBaseProtocol (ApplicationClientProtocol extend > ApplicationBaseProtocol) ,for example getApplications(), when I run "yarn > application -list" by connecting to yarn router, router will throw exception. > So change getDeclaredMethods to getMethods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9413: --- Attachment: image-2019-03-29-10-47-47-953.png > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch, YARN-9413.002.patch, > image-2019-03-29-10-47-47-953.png > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804536#comment-16804536 ] Tao Yang commented on YARN-9413: Thanks [~cheersyang], [~snemeth] for your suggestions. !image-2019-03-29-10-47-47-953.png! The checkstyle issue seems unreasonable to me, I think the indentation level should be 12 for line 1505 in RMAppAttemptImpl, can you please help to see what's the problem? Thanks! For issue 3 as [~snemeth] mentioned , test case can't use the field "conf" directly since it's a private field defined in the parent class (ParameterizedSchedulerTestBase). Other issues above were imported from other cases in TestAMRestart when reusing codes in new test case. I think perhaps I should fix all of them in TestAMRestart, right? > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch, YARN-9413.002.patch, > image-2019-03-29-10-47-47-953.png > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()
[ https://issues.apache.org/jira/browse/YARN-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie reassigned YARN-9424: - Assignee: Shen Yinjie > Change getDeclaredMethods to getMethods in > FederationClientInterceptor#invokeConcurrent() > - > > Key: YARN-9424 > URL: https://issues.apache.org/jira/browse/YARN-9424 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Major > Attachments: YARN-9124_1.patch > > > In YARN-8699, FederationClientInterceptor#invokeConcurrent uses > getDeclaredMethods(), which cannot recongnize some methods in > ApplicationBaseProtocol (ApplicationClientProtocol extend > ApplicationBaseProtocol) ,for example getApplications(), when I run "yarn > application -list" by connecting to yarn router, router will throw exception. > So change getDeclaredMethods to getMethods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()
[ https://issues.apache.org/jira/browse/YARN-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie updated YARN-9424: -- Description: In YARN-8699, FederationClientInterceptor#invokeConcurrent uses getDeclaredMethods(), which cannot recongnize some methods in ApplicationBaseProtocol (ApplicationClientProtocol extend ApplicationBaseProtocol) . We have implemented some methods in FederationClientInterceptor, such as getApplications(), GetQueueUserAclsInfo ...etc, when I run "yarn application -list" by connecting to yarn router, router will throw exception. So change getDeclaredMethods() to getMethods(). was: In YARN-8699, FederationClientInterceptor#invokeConcurrent uses getDeclaredMethods(), which cannot recongnize some methods in ApplicationBaseProtocol (ApplicationClientProtocol extend ApplicationBaseProtocol) ,for example getApplications(), when I run "yarn application -list" by connecting to yarn router, router will throw exception. So change getDeclaredMethods to getMethods. > Change getDeclaredMethods to getMethods in > FederationClientInterceptor#invokeConcurrent() > - > > Key: YARN-9424 > URL: https://issues.apache.org/jira/browse/YARN-9424 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Major > Attachments: YARN-9124_1.patch > > > In YARN-8699, FederationClientInterceptor#invokeConcurrent uses > getDeclaredMethods(), which cannot recongnize some methods in > ApplicationBaseProtocol (ApplicationClientProtocol extend > ApplicationBaseProtocol) . > We have implemented some methods in FederationClientInterceptor, such as > getApplications(), GetQueueUserAclsInfo ...etc, when I run "yarn application > -list" by connecting to yarn router, router will throw exception. > So change getDeclaredMethods() to getMethods(). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804542#comment-16804542 ] Weiwei Yang commented on YARN-9413: --- Yeah, it looks a bit weird, might be a false alarm. Could you please fix everything else according to [~snemeth]'s comment? > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch, YARN-9413.002.patch, > image-2019-03-29-10-47-47-953.png > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-3.0/branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804558#comment-16804558 ] Hadoop QA commented on YARN-8200: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 73 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 7s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 23s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 30s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 36s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 11m 40s{color} | {color:green} branch-2 passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 9m 17s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 21s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 53s{color} | {color:green} branch-2 passed with JDK v1.8.0_191 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 49s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 11m 49s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 49s{color} | {color:red} root-jdk1.7.0_95 with JDK v1.7.0_95 generated 2 new + 1441 unchanged - 2 fixed = 1443 total (was 1443) {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 49s{color} | {color:green} the patch passed with JDK v1.8.0_191 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 10m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 49s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 35s{color} | {color:orange} root: The patch generated 195 new + 3979 unchanged - 73 fixed = 4174 total (was 4052) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 11m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 9s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s{color} | {color:red} The patch 524 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 17s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{c
[jira] [Commented] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp
[ https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804571#comment-16804571 ] Hadoop QA commented on YARN-9348: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 16 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 12m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 18s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 4m 22s{color} | {color:orange} root: The patch generated 3 new + 4 unchanged - 0 fixed = 7 total (was 4) {color} | | {color:green}+1{color} | {color:green} hadolint {color} | {color:green} 0m 0s{color} | {color:green} There were no new hadolint issues. {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:orange}-0{color} | {color:orange} shelldocs {color} | {color:orange} 0m 13s{color} | {color:orange} The patch generated 132 new + 104 unchanged - 0 fixed = 236 total (was 104) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 13s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-docker hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 58s{color} | {color:green} the patch passed {color} | || |
[jira] [Updated] (YARN-9227) DistributedShell RelativePath is not removed at end
[ https://issues.apache.org/jira/browse/YARN-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9227: Attachment: YARN-9227-004.patch > DistributedShell RelativePath is not removed at end > --- > > Key: YARN-9227 > URL: https://issues.apache.org/jira/browse/YARN-9227 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: 0001-YARN-9227.patch, 0002-YARN-9227.patch, > 0003-YARN-9227.patch, YARN-9227-004.patch > > > DistributedShell Job does not remove the relative path which contains jars > and localized files. > {code} > [ambari-qa@ash hadoop-yarn]$ hadoop fs -ls > /user/ambari-qa/DistributedShell/application_1542665708563_0017 > Found 2 items > -rw-r--r-- 3 ambari-qa hdfs 46636 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/AppMaster.jar > -rwx--x--- 3 ambari-qa hdfs 4 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/shellCommands > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9227) DistributedShell RelativePath is not removed at end
[ https://issues.apache.org/jira/browse/YARN-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804574#comment-16804574 ] Prabhu Joseph commented on YARN-9227: - Thanks [~snemeth] for reviewing. Attached 004 patch addressing the above comments. {quote} # There's a while loop in the end: How can you be sure that it won't run infinitely?{quote} Was relying on testcase timeout 9ms. Have changed it to {{GenericTestUtils.waitFor}}. > DistributedShell RelativePath is not removed at end > --- > > Key: YARN-9227 > URL: https://issues.apache.org/jira/browse/YARN-9227 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell >Affects Versions: 3.1.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: 0001-YARN-9227.patch, 0002-YARN-9227.patch, > 0003-YARN-9227.patch, YARN-9227-004.patch > > > DistributedShell Job does not remove the relative path which contains jars > and localized files. > {code} > [ambari-qa@ash hadoop-yarn]$ hadoop fs -ls > /user/ambari-qa/DistributedShell/application_1542665708563_0017 > Found 2 items > -rw-r--r-- 3 ambari-qa hdfs 46636 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/AppMaster.jar > -rwx--x--- 3 ambari-qa hdfs 4 2019-01-23 13:37 > /user/ambari-qa/DistributedShell/application_1542665708563_0017/shellCommands > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804580#comment-16804580 ] Hadoop QA commented on YARN-9413: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} YARN-9413 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9413 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23834/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch, YARN-9413.002.patch, > image-2019-03-29-10-47-47-953.png > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Yang updated YARN-9413: --- Attachment: YARN-9413.003.patch > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch, YARN-9413.002.patch, > YARN-9413.003.patch, image-2019-03-29-10-47-47-953.png > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9413) Queue resource leak after app fail for CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-9413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804589#comment-16804589 ] Tao Yang commented on YARN-9413: Thanks [~cheersyang] for the confirmation about the checkstyle error. Attached v3 patch to fix issues in TestAMRestart according to [~snemeth]'s comment. > Queue resource leak after app fail for CapacityScheduler > > > Key: YARN-9413 > URL: https://issues.apache.org/jira/browse/YARN-9413 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.1.2 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: YARN-9413.001.patch, YARN-9413.002.patch, > YARN-9413.003.patch, image-2019-03-29-10-47-47-953.png > > > To reproduce this problem: > # Submit an app which is configured to keep containers across app attempts > and should fail after AM finished at first time (am-max-attempts=1). > # App is started with 2 containers running on NM1 node. > # Fail the AM of the application with PREEMPTED exit status which should not > count towards max attempt retry but app will fail immediately. > # Used resource of this queue leaks after app fail. > The root cause is the inconsistency of handling app attempt failure between > RMAppAttemptImpl$BaseFinalTransition#transition and > RMAppImpl$AttemptFailedTransition#transition: > # After app fail, RMAppFailedAttemptEvent will be sent in > RMAppAttemptImpl$BaseFinalTransition#transition, if exit status of AM > container is PREEMPTED/ABORTED/DISKS_FAILED/KILLED_BY_RESOURCEMANAGER, it > will not count towards max attempt retry, so that it will send > AppAttemptRemovedSchedulerEvent with keepContainersAcrossAppAttempts=true and > RMAppFailedAttemptEvent with transferStateFromPreviousAttempt=true. > # RMAppImpl$AttemptFailedTransition#transition handle > RMAppFailedAttemptEvent and will fail the app if its max app attempts is 1. > # CapacityScheduler handles AppAttemptRemovedSchedulerEvent in > CapcityScheduler#doneApplicationAttempt, it will skip killing and calling > completion process for containers belong to this app, so that queue resource > leak happens. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()
[ https://issues.apache.org/jira/browse/YARN-9424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16804601#comment-16804601 ] Hadoop QA commented on YARN-9424: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 37s{color} | {color:green} hadoop-yarn-server-router in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 46m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9424 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964120/YARN-9124_1.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 643327fafdad 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d7a2f94 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23835/testReport/ | | Max. process+thread count | 786 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23835/console | | Powered by | Apache Yetus 0.8.0 http://yetus.a