[jira] [Comment Edited] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537738#comment-16537738 ] Jonathan Hung edited comment on YARN-8200 at 7/9/18 10:55 PM: -- Perf unit test: 2 resources: {noformat} 2 14652.015 4 20876.826 6 29455.08 8 45045.047 10 37735.848 12 40816.33 14 46403.71 16 47169.812 18 49261.082 20 48543.688 22 49140.05 24 47393.363 26 48899.754 28 48899.754 30 49751.242 32 50125.312 34 46296.297 36 48780.49 38 47961.63 40 47732.695 42 47732.695 44 48076.92 46 49019.61 48 46728.973 50 42643.92 52 46296.297 54 48426.15 56 49504.95 58 47846.89 60 48543.688 62 47393.363 64 48899.754 66 48661.8 68 49140.05 70 49019.61 72 48780.49 74 48899.754 76 49382.715 78 47393.363 80 48076.92 82 48192.77 84 47732.695 86 50125.312 88 48899.754 90 49019.61 92 48076.92 94 48192.77 96 48076.92 98 42553.19 100 47846.89 102 47846.89 104 48780.49 106 47961.63 108 49140.05 110 47169.812 112 47846.89 114 47619.047 116 47619.047 118 49875.312 120 47619.047 122 47393.363 124 47505.938 126 48899.754 128 48780.49 130 46189.375 132 47505.938 134 45871.56 136 47619.047 138 48543.688 140 47619.047 142 48076.92 144 48076.92 146 47732.695 148 47281.324 150 48543.688 152 48661.8 154 47393.363 156 48543.688 158 47961.63 160 46296.297 162 47846.89 164 47846.89 166 48543.688 168 47505.938 170 47281.324 172 48309.18 174 48309.18 176 5.0 178 47505.938 180 48192.77 182 48192.77 184 48309.18 186 48543.688 188 48661.8 190 48192.77 192 47846.89 194 42105.26 196 48899.754 198 47961.63 #ResourceTypes = 2. Avg of fastest 20: 49382.715 2018-06-26 17:12:59,756 ERROR [Thread[Thread-11,5,main]] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(696)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted{noformat} 3 resources: {noformat} 2 10964.912 4 15760.441 6 26990.553 8 24752.475 10 32733.225 12 30487.805 14 27397.26 16 35778.176 18 33112.582 20 34843.207 22 31347.963 24 37383.176 26 34482.758 28 39062.5 30 38095.24 32 35842.293 34 32154.342 36 39447.73 38 37878.79 40 38240.918 42 36101.082 44 38167.938 46 38834.953 48 38022.812 50 38610.04 52 37105.75 54 38610.04 56 39215.688 58 38022.812 60 39215.688 62 37950.664 64 39138.94 66 37735.848 68 38684.72 70 38986.355 72 37735.848 74 37243.95 76 38535.645 78 37807.184 80 38314.176 82 36900.367 84 38610.04 86 39370.08 88 38314.176 90 39525.69 92 38461.54 94 39761.43 96 39370.08 98 38910.504 100 38022.812 102 39138.94 104 38314.176 106 39292.73 108 39292.73 110 39370.08 112 39292.73 114 38314.176 116 39840.637 118 39062.5 120 39370.08 122 37950.664 124 39062.5 126 37664.785 128 38684.72 130 38986.355 132 39525.69 134 40322.582 136 39292.73 138 37664.785 140 39525.69 142 39138.94 144 39370.08 146 39840.637 148 37037.035 150 38387.715 152 39525.69 154 37523.453 156 39603.96 158 36764.707 160 32362.459 162 29542.098 164 31250.0 166 29112.082 168 32000.0 170 27662.518 172 27100.271 174 26845.637 176 33388.98 178 35714.285 180 31152.648 182 36832.414 184 35650.625 186 38461.54 188 34662.047 190 31104.2 192 32573.29 194 36900.367 196 26702.27 198 30211.48 #ResourceTypes = 3. Avg of fastest 20: 39525.69 2018-06-26 17:16:14,530 ERROR [Thread[Thread-11,5,main]] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(696)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted {noformat} 4 resources: {noformat} 2 13166.557 4 21299.254 6 33222.59 8 37174.723 10 32786.887 12 33955.855 14 38095.24 16 37243.95 18 38167.938 20 37807.184 22 36900.367 24 39370.08 26 36563.07 28 38240.918 30 38759.69 32 39370.08 34 35523.98 36 39370.08 38 38610.04 40 38759.69 42 39603.96 44 37878.79 46 38910.504 48 38684.72 50 39682.54 52 38461.54 54 38535.645 56 37105.75 58 38910.504 60 38095.24 62 38684.72 64 38910.504 66 39138.94 68 39292.73 70 38095.24 72 39215.688 74 39447.73 76 39447.73 78 4.0 80 38759.69
[jira] [Comment Edited] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607445#comment-16607445 ] Jonathan Hung edited comment on YARN-8200 at 9/7/18 6:03 PM: - Build https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 timed out: {noformat}cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt 2>&1 Elapsed: 2m 40s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt 2>&1 Elapsed: 15m 20s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt 2>&1 Elapsed: 4m 49s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt 2>&1 Elapsed: 79m 41s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt 2>&1 Elapsed: 3m 59s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt 2>&1 Build timed out (after 500 minutes). Marking the build as aborted. Build was aborted Performing Post build task... Match found for :. : True Logical operation result is TRUE Running script : #!/bin/bash{noformat} It appears the unit tests hang here: (https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt) {noformat}[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hadoop-yarn-client --- [INFO] Compiling 34 source files to /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/test-classes [WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java:[311,6] [deprecation] MiniYARNCluster(String,int,int,int,int,boolean) in MiniYARNCluster has been deprecated [WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestNMClientAsync.java:[453,16] [deprecation] onIncreaseContainerResourceError(ContainerId,Throwable) in AbstractCallbackHandler has been deprecated [WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/h
[jira] [Comment Edited] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607445#comment-16607445 ] Jonathan Hung edited comment on YARN-8200 at 9/7/18 6:06 PM: - Build https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 timed out: {noformat}cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt 2>&1 Elapsed: 2m 40s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt 2>&1 Elapsed: 15m 20s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt 2>&1 Elapsed: 4m 49s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt 2>&1 Elapsed: 79m 41s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt 2>&1 Elapsed: 3m 59s cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse -Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop -Pyarn-ui clean test -fae > /testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt 2>&1 Build timed out (after 500 minutes). Marking the build as aborted. Build was aborted Performing Post build task... Match found for :. : True Logical operation result is TRUE Running script : #!/bin/bash{noformat} It appears the unit tests hang here: (https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt) {noformat}[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hadoop-yarn-client --- [INFO] Compiling 34 source files to /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/test-classes [WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java:[311,6] [deprecation] MiniYARNCluster(String,int,int,int,int,boolean) in MiniYARNCluster has been deprecated [WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestNMClientAsync.java:[453,16] [deprecation] onIncreaseContainerResourceError(ContainerId,Throwable) in AbstractCallbackHandler has been deprecated [WARNING] /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/h
[jira] [Comment Edited] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450136#comment-16450136 ] Daniel Templeton edited comment on YARN-8200 at 4/24/18 4:05 PM: - The challenge in backporting resource types into 3.0 was mostly just in splitting resource types from resource profiles. Otherwise it wasn't bad. But that was pull from 3.x into 3.0. Going back into 2.x will be much trickier. The code that resource types touches is code that you want to handle very, very carefully because it's at the core of what the resource manager does. I don't think it's a good idea to pull resource types back into 2.x. Resource types represent a major change to the way the resource manager functions. It's not a change that's appropriate for a minor release. In fact, I would argue that resource types is one of the scarier changes in 3.0, so if you're willing to take on that risk in 2.x, you're probably better served just moving to 3.0. was (Author: templedf): The challenge in backporting resource types into 3.0 was mostly just in splitting resource types from resource profiles. Otherwise it wasn't bad. But that was pull from 3.x into 3.0. Going back into 2.x will be much trickier. The code that resource types touches is code that you want to handle very, very carefully because it's at the core of what the resource manager does. I don't think it's a good idea to pull resource types back into 2.x. Resource types represent a major change to the way the resource manager functions. It's not something that appropriate for a minor release. In fact, I would argue that resource types is one of the scarier changes in 3.0, so if you're willing to take on that risk in 2.x, you're probably better served just moving to 3.0. > Backport resource types/GPU features to branch-2 > > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8200) Backport resource types/GPU features to branch-2
[ https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450136#comment-16450136 ] Daniel Templeton edited comment on YARN-8200 at 4/24/18 4:05 PM: - The challenge in backporting resource types into 3.0 was mostly just in splitting resource types from resource profiles. Otherwise it wasn't bad. But that was pull from 3.x into 3.0. Going back into 2.x will be much trickier. The code that resource types touches is code that you want to handle very, very carefully because it's at the core of what the resource manager does. I don't think it's a good idea to pull resource types back into 2.x. Resource types represent a major change to the way the resource manager functions. It's not something that appropriate for a minor release. In fact, I would argue that resource types is one of the scarier changes in 3.0, so if you're willing to take on that risk in 2.x, you're probably better served just moving to 3.0. was (Author: templedf): The challenge in backporting resource types into 3.0 was mostly just in splitting resource types from resource profiles. Otherwise it wasn't bad. But that was pull from 3.x into 3.0. Going back into 2.x will be much trickier. The code that resource types touches is code that you want to handle very, very carefully because it's at the core of what the resource manager does. I don't think it's a good idea to pull resource types back into 2.x Resource types represent a major change to the way the resource manager functions. It's not something that appropriate for a minor release. In fact, I would argue that resource types is one of the scarier changes in 3.0, so if you're willing to take on that risk in 2.x, you're probably better served just moving to 3.0. > Backport resource types/GPU features to branch-2 > > > Key: YARN-8200 > URL: https://issues.apache.org/jira/browse/YARN-8200 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > > Currently we have a need for GPU scheduling on our YARN clusters to support > deep learning workloads. However, our main production clusters are running > older versions of branch-2 (2.7 in our case). To prevent supporting too many > very different hadoop versions across multiple clusters, we would like to > backport the resource types/resource profiles feature to branch-2, as well as > the GPU specific support. > > We have done a trial backport of YARN-3926 and some miscellaneous patches in > YARN-7069 based on issues we uncovered, and the backport was fairly smooth. > We also did a trial backport of most of YARN-6223 (sans docker support). > > Regarding the backports, perhaps we can do the development in a feature > branch and then merge to branch-2 when ready. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org