[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200702#comment-16200702 ] Sunil G commented on YARN-6620: --- Committing this patch.. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, > YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch, > YARN-6620.012.patch, YARN-6620.013.patch, YARN-6620.014.patch, > YARN-6620.015.patch, YARN-6620.016.patch, YARN-6620.017.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198968#comment-16198968 ] Sunil G commented on YARN-6620: --- Cool. Makes sense. I will wait for hear from others if they have some more comments. Otherwise I will commit this patch tomorrow. Thank You. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, > YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch, > YARN-6620.012.patch, YARN-6620.013.patch, YARN-6620.014.patch, > YARN-6620.015.patch, YARN-6620.016.patch, YARN-6620.017.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198947#comment-16198947 ] Wangda Tan commented on YARN-6620: -- Thanks for the review, [~sunilg]. For #1, The JIRA is YARN-7159. I'm not sure if it is a valid use case for client to define different unit / type for resource-types.xml, and in YARN-7307 we're considering to remove client-side resource-type.xml. Let's discuss more on the related JIRAs. For #2, It might be better to rename "mandatory" to "first-class", so once FPGA support added to YARN, it is a first-class resource instead of mandatory resources. I think it might be valuable to have a list of first class resources supported by YARN, and we don't allow client to change core properties (includes unit and type, not include min/max value). With this we can have better out-of-box user experiences because most use cases should be addressed by first-class resources. To me NUMA is a little different here, today we don't support NUMA as a separate resource, instead it is additional affinity for known resource types (cpu/memory). So before we want to support centralized NUMA allocation in RM/scheduler, we don't need to model NUMA as a separate resource type. And once we need to do that, some refactoring need to be done to existing resource api since it cannot describe hierarchy and relations between resource units. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, > YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch, > YARN-6620.012.patch, YARN-6620.013.patch, YARN-6620.014.patch, > YARN-6620.015.patch, YARN-6620.016.patch, YARN-6620.017.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198323#comment-16198323 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 43s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 11m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 8m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 32s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 52s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 83 new + 488 unchanged - 24 fixed = 571 total (was 512) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 7m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 1s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 47s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 523 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 23m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 11m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 42s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 24m 1s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 2m 8s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}238m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198224#comment-16198224 ] Sunil G commented on YARN-6620: --- Thanks [~leftnoteasy] for the patch. Generally I am fine with latest patch. I could commit this tomorrow if there are no objection Not very much related to patch, there are couple of other points to note: # {{ResourceUtils.java}} is shared by client also. Hence once we make mandatory resources like cpu mem and gpu, client will not get a chance to submit in GB's for mem fo eg. Server could decode it at the gate and convert to MBs. But client is loosing its chance for easy usage of resource units. I think this has to be a separate jira. And I think there is a ticket for this, i ll find it. # {{resource-types.xml}} is used to add new resources. GPU is mandatory like cpu and mem, but I think resources like numa/fpga etc should follow the YARN-3926 model, correct?. My point is that, we have new configs related to GPUs here. I think for other resource specific configs, we could use resource-types.xml or any new config xml to have more clarity and modularity. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, > YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch, > YARN-6620.012.patch, YARN-6620.013.patch, YARN-6620.014.patch, > YARN-6620.015.patch, YARN-6620.016.patch, YARN-6620.017.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16198004#comment-16198004 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 24s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 83 new + 489 unchanged - 24 fixed = 572 total (was 513) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 31s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 523 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 44s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 55s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 7s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}111m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.util.resource.TestResourceUtils | | | hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler | | | hadoop.yarn.
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197798#comment-16197798 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} YARN-6620 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6620 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12891148/YARN-6620.015.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/17830/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, > YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch, > YARN-6620.012.patch, YARN-6620.013.patch, YARN-6620.014.patch, > YARN-6620.015.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194059#comment-16194059 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 45s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 8m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 49s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 5s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 81 new + 489 unchanged - 24 fixed = 570 total (was 513) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 20s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s{color} | {color:red} The patch 523 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 43s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 54s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 33s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 94m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194000#comment-16194000 ] Wangda Tan commented on YARN-6620: -- Thanks [~sunilg] for the review. For #1. Done. For #2. I prefer to add support once we have 2nd plugin. For #3, it might be fine since this is cleanup instead of free(). For #4/#5, Done. For #6, personally I prefer to keep it as exception because it looks like a configuration problem: the configured gpu plugin won't take any effect, it's better to fail fast instead of running with potential issues. For #7, yes > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, > YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch, > YARN-6620.012.patch, YARN-6620.013.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193076#comment-16193076 ] Sunil G commented on YARN-6620: --- Thanks [~leftnoteasy] Few more comments: # As {{GPU_URI}} is now hardcoded, we have to ensure that same name is not supplied by user with different unit etc. Similar to CPU and Memory, we have to have a mandatory check for this resource also, correct ? # In {{ResourcePluginManager#initialize}}, {{GpuResourcePlugin}} is created directly in main code path. Its better we pass the name of plugin {{resourceName}} to another factory and get the correct object back. For now this is fine as we have only one plugin. It ll be better to improve later also. # In {{ResourcePluginManager#cleanup}}, do we need to remove the plugin entry as cleanup is already done? # {{GpuDiscoverer#getGpuDeviceInformation}} still continue to execute discover command even after reaching max fail limit? {code} 102 if (numOfErrorExecutionSinceLastSucceed == MAX_REPEATED_ERROR_ALLOWED) { 103 LOG.error("Failed to execute GPU device information detection script for " 104 + MAX_REPEATED_ERROR_ALLOWED + " times, skip following exections."); 105 numOfErrorExecutionSinceLastSucceed++; 106 } {code}Also please fix typo in {{exections}} to {{executions}} in above error message # I think *GpuDiscoverer* need not have to have a {{getConf}} ? # In {{GpuNodeResourceUpdateHandler#updateConfiguredResource}}, do we need to throw exception when yarn was not configured to support any GPUs but auto discovery found some devices as per plugin ? Could we log as warn may be? # {{GpuDiscoverer}} command is specific to linux. So in windows shell, this has to be failed. correct? > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, > YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch, > YARN-6620.012.patch, YARN-6620.013.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192234#comment-16192234 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 48s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 36s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 81 new + 479 unchanged - 24 fixed = 560 total (was 503) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 27s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 523 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 39s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 43s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 3s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 31s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}107m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | YARN-6620 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachmen
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191946#comment-16191946 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 11s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 81 new + 480 unchanged - 24 fixed = 561 total (was 504) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 26s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 523 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 32s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 1 new + 103 unchanged - 0 fixed = 104 total (was 103) {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 42s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 48s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 27s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 99m 39s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker |
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191735#comment-16191735 ] Wangda Tan commented on YARN-6620: -- Thanks [~sunilg], all good points. bq. ResourceInformation.GPU_URI. I think this need not have to be hard coded. Other than memory-mb and vcores, all other resource names could be pulled from resource-types.xml. So is it fine to have the new resource names could be pulled form ResourceUtils itself. I personally tend to make all first class supported resource type name to hardcoded. Part of the reason is, on NM side, many component needs to understand the question "what resource type name associated with the resource plugin". If we don't have a hard coded resource type for GPU, we have to define a "gpu_resource_type_name=..." in yarn-site.xml, which I want to avoid and I don't see any benefit of doing this. I don't expect any admin use "yarn.io" namespace to define non-first class supported resource types. bq. In NM#serviceInit, ResourcePluginManager is created always. So do we need to have a null check in other places? We have lots of UTs use a mocked NMContext which has null ptr of RPM. I found many unit test failures without this, if you really think we should update this, I can update UTs as well. bq. In GPU#preStart(Container container), if requested container doesnt have any GPU demand, do we need to proceed further? Yes we still need to proceed. We need to blacklist for all GPUs if a container doesn't have GPU request. bq. In case if future, an affinity constraint is coming for a given GPU device for a container, I guess we need a lil more changes to GpuAllocation class. Could we have some dummy apis defined now so that, too much of redesign is not needed later. I would prefer to delay this, one of the reason is, additional affinity constraint might be made by RM instead of NM. For example an app requests 4 GPUs on the same host, each of host has 8 GPUs. Only RM has the global picture of which host has 4 closest GPUs. So far we don't have solution of this problem yet. Once we want to support such use cases, there're many other places need to be updated beyond GpuAllocation. bq. In GPU#bootstrap, we are retuning null. Is it correct? Yes, we have done cgroups mounting, so we don't need to do any other ops for bootstrap. bq. In ResourcePluginManager, we could avoid synchronized and keep the map as ConcurrentHashMap ? I personally prefer to continue this way: RPM as well as (almost) other NM components don't have performance issues now and in the future. Use synchronized lock can enforce atomic semantics to all fields other than map-only. Unless we see some performance issues of these component, I prefer to use most simple and straightforward synchronized lock. bq. In GpuDiscover#initialize, along with file existence, we could also check for file permission owner etc to ensure that its been accessed correctly. We don't throw any exception during the initialize, it might be not useful to check this. I would prefer to let script executor to do this check when run commands. bq. GpuDeviceInformationParser has too much xml dependency. Hadoop common has some xml parser, correct? could we use that ? At the beginning I directly use XML parser, similar to how FairScheduler parses config files. Devaraj suggested to use JAXB and I think it makes sense because we can reuse all these objects once we want to support query GPU information from web UI. Addressed 3/4/9. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, > YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191435#comment-16191435 ] Sunil G commented on YARN-6620: --- Thanks [~leftnoteasy] for the effort. Generally approach and patch seems fine. Few comments # {{ResourceInformation.GPU_URI}}. I think this need not have to be hard coded. Other than memory-mb and vcores, all other resource names could be pulled from resource-types.xml. So is it fine to have the new resource names could be pulled form ResourceUtils itself. # In {{NM#serviceInit}}, *ResourcePluginManager* is created always. So do we need to have a null check in other places? # In {{recoverAssignedGpus}}, we could check for NumberFormatException also while doing parseInt # Lets do something like {{Long.valueOf(value).intValue()}} instead of direct type casting from long to int. Refer {{getRequestedGpus}} # In {{GPU#preStart(Container container)}}, if requested container doesnt have any GPU demand, do we need to proceed further? # In case if future, an affinity constraint is coming for a given GPU device for a container, I guess we need a lil more changes to GpuAllocation class. Could we have some dummy apis defined now so that, too much of redesign is not needed later. # In {{GPU#bootstrap}}, we are retuning null. Is it correct? # In {{ResourcePluginManager}}, we could avoid synchronized and keep the map as ConcurrentHashMap ? # {{ResourcePluginManager}} could be service itself so that we can handle cases where NM is shutdown and some cleanup is needed. I think all plugin could have a *cleanup* and that could be triggered from Manager. # In {{GpuDiscover#initialize}}, along with file existence, we could also check for file permission owner etc to ensure that its been accessed correctly. # {{GpuDeviceInformationParser}} has too much xml dependency. Hadoop common has some xml parser, correct? could we use that ? > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch, > YARN-6620.009.patch, YARN-6620.010.patch, YARN-6620.011.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181764#comment-16181764 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 3s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 16s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 80 new + 478 unchanged - 24 fixed = 558 total (was 502) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 25s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 523 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 44s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 58s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 15m 7s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}102m 7s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | YARN-6620 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachmen
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175399#comment-16175399 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 53s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 11s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 80 new + 478 unchanged - 24 fixed = 558 total (was 502) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 28s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 523 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 12s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 40s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 53s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m 42s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 14s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | | Load of known null value in org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager.initialize(Context) At ResourcePluginManager.java:in org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager.initialize(Context
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172244#comment-16172244 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 12s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 1s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 13s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 79 new + 479 unchanged - 24 fixed = 558 total (was 503) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 0s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:green}+1{color} | {color:green} shelldocs {color} | {color:green} 0m 27s{color} | {color:green} There were no new shelldocs issues. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 523 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 41s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 46s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 52s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | YARN-6620 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12887944/YARN-6620.009.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml shellcheck shel
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171027#comment-16171027 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 15 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 13m 36s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 51s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 53s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 13s{color} | {color:orange} root: The patch generated 74 new + 484 unchanged - 24 fixed = 558 total (was 508) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 30s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 523 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 3s{color} | {color:red} The patch has 1 ill-formed XML file(s). {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 8s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 44s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 49s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 55s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 54s{color} | {color:red} hadoop-sls in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 36s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}112m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | | Field only ever set to null:null: org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuResourcePlugin.gpuResourceHandler In GpuResourcePlugin.java | | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | | | hadoop.yarn.server.nodemanager.TestLinuxContainerExecutorWithMocks | | | hadoop.yarn.server
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171026#comment-16171026 ] Zhankun Tang commented on YARN-6620: [~wangda], Thanks for the 008 patch. LGTM. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch, YARN-6620.008.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170951#comment-16170951 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} YARN-6620 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6620 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12887788/YARN-6620.007.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/17509/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch, YARN-6620.007.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170294#comment-16170294 ] Wangda Tan commented on YARN-6620: -- [~tangzhankun], Good point, first of all, to me the requirement: bq. ... one physical machine with two different vendor GPU cards .. Is not a real requirement, we may need to spend 20% of the effort to support the 1% requirement. And even if it is true, as you commented, admin may need to register two plugins and handle isolation part separately. I can see some minor code changes needed (For example, make the resource-name inside resource-vector can be configurable, by default it is gpu, and admin can change it to gpu-vendor-a and gpu-vendor-b, but I would prefer to do this once the requirement comes. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169655#comment-16169655 ] Zhankun Tang commented on YARN-6620: {quote} Good point, I think we should use node attribute to distinguish them. I think this might be unavoidable: different DL workload needs different driver versions / GPU architectures, and different frameworks like OpenCL/CUDA, we need node attribute anyway. {quote} [~wangda], Yeah. Node attributes is a must. And just another thing come to my mind, do we need to support one physical machine with two different vendor GPU cards? If this scenario requirement is true, we may need to extend resource handler to mange different several plugins(I've done this in prior FPGA patch) as below: 1. In "bootstrap" method, all GPU vendor's plugin register to one GPU resource handler with the resource name it can handlers. For instance, one plugin A registers a resource "A-GPU" and B register "B-GPU". And GPU resource handler will holds records of . 2. When "preStart" invoked, it will retrieve the ResourceInformation array from container.getResource().getResources() to find a proper GPU vendor plugin to do plugin callback( or no callback needed for GPU. It seems needed for FPGA) and then use GPU allocator allocates requested count of this specific type of GPU in a round-robin manner. Then do cgroups isolation. 3. Now back to the AM, it's possible to request a container with one "A-GPU" named resource in containerRequest and node attributes "CUDA v1" at the same time. I'm not sure if this one host with different vendor device is a real requirements. If so, it may brings another concerns to our current design since we treat them as the same resource implicitly. Any idea? > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169573#comment-16169573 ] Wangda Tan commented on YARN-6620: -- [~tangzhankun], Sorry to cause confusing, the attached patch still need some cleanups and additional code works, which will take another 2-3 days to finish. Will update the patch once it in a good state. Regarding to ur comments: bq. 1. Current GPUResourceAllocator is not got from LocalResourceAllocators but created in GpuResourceHandlerImpl directly. Is this intended? Yes it is intended, ideally each plugin should maintain its own allocator/handler, etc. I plan to remove LocalResourceAllocators. bq. 2. The GpuResourceHandler get container's requested GPU from an environment key "REQUESTED_GPU_NUM". So in fact, there's no need to define the allowed GPU resource in "node-resouce.xml" Ah, this code is done before YARN-3926 merge, so I will update to use resource profile in the next uploaded patch. bq. For instance, if different vendors' GPU cards are installed in the cluster, how can a user distinguish them? thru node attributes? Good point, I think we should use node attribute to distinguish them. I think this might be unavoidable: different DL workload needs different driver versions / GPU architectures, and different frameworks like OpenCL/CUDA, we need node attribute anyway. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169566#comment-16169566 ] Zhankun Tang commented on YARN-6620: [~wangda], Thanks for the patch! Now we have defined the resource plugin framework in NM which manages various resources that implemented ResourcePlugin and ResourceHandler interface easily. It also provide a good way for resource plugin to update NodeStatus which provide possibility for FPGA IP update thru node labels/attributes. Two questions/comments from me: 1. Current GPUResourceAllocator is not got from LocalResourceAllocators but created in GpuResourceHandlerImpl directly. Is this intended? 2. The GpuResourceHandler get container's requested GPU from an environment key "REQUESTED_GPU_NUM". So in fact, there's no need to define the allowed GPU resource in "node-resouce.xml". This is not an issue at present, but may brings a potential limitation that the end-user cannot declare different type of GPU vendor resource. For instance, if different vendors' GPU cards are installed in the cluster, how can a user distinguish them? thru node attributes? > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166780#comment-16166780 ] Devaraj K commented on YARN-6620: - Thanks [~leftnoteasy] for the quick patch. bq. Switched JAXB to handle XML parsing instead of check tags. Overall looking good, it would be better if you could group the adapter and other supporting classes as inner classes in PerGpuDeviceInformation. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch, > YARN-6620.006-WIP.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165134#comment-16165134 ] Wangda Tan commented on YARN-6620: -- Thanks [~devaraj.k] for the additional explanations, make sense to me, I will update patch to address above two comments. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165119#comment-16165119 ] Devaraj K commented on YARN-6620: - Thanks [~leftnoteasy] for the responses. bq. My understanding of JAXBContext is mostly used when we need to convert between object and XML/JSON. Since output of nvidia-smi is a customized XML format, which doesn't follow JAXB standard. Is it still best practice to use JAXBContext under such use case? For example, FairScheduler parses XML file directly: AllocationFileLoaderService#reloadAllocations. JAXBContext can be used for any XML format, doesn't have to be in any specific format, I could see that the sample format in the patch can be converted to a Java Object ,so that we can eliminate the traversing and parsing logic in GpuDeviceInformationParser.java. bq. I considered this option before, unless there's strong need for this to run different command or call Nvidia native APIs directly, I would prefer to hard code to use nvidia-smi instead of introducing another abstraction layer. I'm open to do refactoring to support this case once we have such requirements. I think it would be useful if users have sym links created with different names than the hard coded name. I feel we don't have to add a new configuration for the executable instead we can have the binary name also as part of DEFAULT_NM_GPU_PATH_TO_EXEC and users can provide the path with the executable name for the configuration 'yarn.nodemanager.resource.gpu.path-to-executables'. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163906#comment-16163906 ] Wangda Tan commented on YARN-6620: -- [~devaraj.k], Thanks for reviewing the patch. Only several questions/comments: bq. 1. XML file reading in GpuDeviceInformationParser.java, can we use the existing libraries like javax.xml.bind.JAXBContext to unmarshall the XML document to a Java Object instead of reading tag by tag? My understanding of JAXBContext is mostly used when we need to convert between object and XML/JSON. Since output of nvidia-smi is a customized XML format, which doesn't follow JAXB standard. Is it still best practice to use JAXBContext under such use case? For example, FairScheduler parses XML file directly: {{AllocationFileLoaderService#reloadAllocations}}. bq. 3. Instead of hardcoding the BINARY_NAME, can it be included as part of DEFAULT_NM_GPU_PATH_TO_EXEC as a default value, so that it can be also becomes configurable if incase users want to change it. I considered this option before, unless there's strong need for this to run different command or call Nvidia native APIs directly, I would prefer to hard code to use nvidia-smi instead of introducing another abstraction layer. I'm open to do refactoring to support this case once we have such requirements. bq. 5. Can we use spaces instead of tab characters for indentation in nvidia-smi-sample-output.xml? This is directly copy from nvidia-smi output, the major reason is to make sure we can properly parse real commandline output, so I prefer to keep it as-is. bq. 6. Are we going to support multiple containers/processes(limited number) sharing the same GPU device? No, since no proper isolation can be done for this, I don't plan to support this as of now. Since YARN-3926 just get merged to trunk, I will add code to support reading/specifying GPU value in resource object. I will address all comments in the next update. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162491#comment-16162491 ] Devaraj K commented on YARN-6620: - Thanks [~leftnoteasy] for the patch, Great work! There are some comments on the patch. 1. XML file reading in GpuDeviceInformationParser.java, can we use the existing libraries like javax.xml.bind.JAXBContext to unmarshall the XML document to a Java Object instead of reading tag by tag? 2. If you don't agree to use the existing libraries for reading xml file, 'in' stream may have to be closed after reading/parsing. {code:xml} InputStream in = IOUtils.toInputStream(sanitizeXmlInput(xmlStr), "UTF-8"); doc = builder.parse(in); {code} 3. Instead of hardcoding the BINARY_NAME, can it be included as part of DEFAULT_NM_GPU_PATH_TO_EXEC as a default value, so that it can be also becomes configurable if incase users want to change it. {code:xml} public static final String DEFAULT_NM_GPU_PATH_TO_EXEC = ""; protected static final String BINARY_NAME = "nvidia-smi"; {code} 4. Please change the inline comment here accordingly. {code:xml} + /** + * Disk as a resource is disabled by default. + **/ + @Private + public static final boolean DEFAULT_NM_GPU_RESOURCE_ENABLED = false; {code} 5. Can we use spaces instead of tab characters for indentation in nvidia-smi-sample-output.xml? 6. Are we going to support multiple containers/processes(limited number) sharing the same GPU device? 7. {code:title=GpuResourceAllocator.java|borderStyle=solid} for (int deviceNum : allowedGpuDevices) { if (!usedDevices.containsKey(deviceNum)) { usedDevices.put(deviceNum, containerId); assignedGpus.add(deviceNum); if (assignedGpus.size() == numRequestedGpuDevices) { break; } } } // Record in state store if we allocated anything if (!assignedGpus.isEmpty()) { List allocatedDevices = new ArrayList<>(); for (int gpu : assignedGpus) { allocatedDevices.add(String.valueOf(gpu)); } {code} Can you merge these two for loops into a one like below, {code:xml} usedDevices.put(deviceNum, containerId); assignedGpus.add(deviceNum); allocatedDevices.add(String.valueOf(deviceNum)); {code} And also if the condition *if (assignedGpus.size() == numRequestedGpuDevices)* doesn't meet, do we need to throw an exception or log the error? 8. I see that getGpuDeviceInformation() is getting invoked twice which intern executes shell command and parses the xml file which are costly operations. Do we need to execute it twice here? {code:title=GpuResourceDiscoverPlugin.java|borderStyle=solid} GpuDeviceInformation info = getGpuDeviceInformation(); LOG.info("Trying to discover GPU information ..."); GpuDeviceInformation info = getGpuDeviceInformation(); {code} And also I don't convince that having the logic other than assigning conf in setConf() method. {code:xml} public synchronized void setConf(Configuration conf) { this.conf = conf; numOfErrorExecutionSinceLastSucceed = 0; featureEnabled = conf.getBoolean(YarnConfiguration.NM_GPU_RESOURCE_ENABLED, YarnConfiguration.DEFAULT_NM_GPU_RESOURCE_ENABLED); if (featureEnabled) { String dir = conf.get(YarnConfiguration.NM_GPU_PATH_TO_EXEC, . {code} And also there are Hadoop QA reported comments which needs to be fixed. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch, YARN-6620.002.patch, > YARN-6620.003.patch, YARN-6620.004.patch, YARN-6620.005.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159217#comment-16159217 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 59s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 52s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 4s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 35 new + 239 unchanged - 0 fixed = 274 total (was 239) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 523 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 3s{color} | {color:red} The patch has 1 ill-formed XML file(s). {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 39s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 41s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 59s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 33s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 80m 3s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | | hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.TestGpuResourceDiscoverPlugin | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | YARN-6620 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886139/YARN-6620.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux cee45ee358e1 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c35510a | | D
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124381#comment-16124381 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 52s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 10s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 7s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 23 new + 455 unchanged - 0 fixed = 478 total (was 455) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 4s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 39s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 51s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 0s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 78m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6620 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12881575/YARN-6620.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle xml | | uname | Linux 4bd43ac15b66 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 28d97b7 | | Default Java | 1.8.0_144 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.ap
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122286#comment-16122286 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 43s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 54s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 40s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 7s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 22 new + 457 unchanged - 0 fixed = 479 total (was 457) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 9s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 32s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager generated 1 new + 105 unchanged - 0 fixed = 106 total (was 105) {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 42s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 34s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 36s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 47s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | | Boxing/unboxing to parse a primitive org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.GpuResourceAllocator.recoverAssignedGpus(ContainerId) At GpuResourceAllocator.java:org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.GpuResourceAllocator.recoverAssignedGpus(ContainerId) At GpuResourceAllocator.java:[line 92] | | | Boxing/unboxing to parse a primitive org.apache.hadoop.yarn.server.nodemanager.containerman
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095358#comment-16095358 ] Hadoop QA commented on YARN-6620: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} YARN-6620 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6620 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12878246/YARN-6620.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16508/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-6620.001.patch > > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6620) [YARN-6223] NM Java side code changes to support isolate GPU devices by using CGroups
[ https://issues.apache.org/jira/browse/YARN-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095348#comment-16095348 ] Wangda Tan commented on YARN-6620: -- (Deleted originally attached patch since it is out-of-date and confusing). > [YARN-6223] NM Java side code changes to support isolate GPU devices by using > CGroups > - > > Key: YARN-6620 > URL: https://issues.apache.org/jira/browse/YARN-6620 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > > This JIRA plan to add support of: > 1) GPU configuration for NodeManagers > 2) Isolation in CGroups. (Java side). > 3) NM restart and recovery allocated GPU devices -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org