[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766900#comment-16766900 ] Hadoop QA commented on YARN-9298: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 1 unchanged - 2 fixed = 1 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 14s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}138m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9298 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958511/YARN-9298.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c15f61ea6635 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7b11b40 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23393/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23393/testReport/ | | Max. process+thread count | 953 (vs. ulimi
[jira] [Commented] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup
[ https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766891#comment-16766891 ] Keqiu Hu commented on YARN-9294: Confirmed it is a race condition in cgroups creation & executing command in the cgroups. We plan to go ahead with a safe check between these two privileged operations. Note the same issue should apply to 3.1+ as well. cc [~wangda] [~tangzhankun] > Potential race condition in setting GPU cgroups & execute command in the > selected cgroup > > > Key: YARN-9294 > URL: https://issues.apache.org/jira/browse/YARN-9294 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.10.0 >Reporter: Keqiu Hu >Assignee: Keqiu Hu >Priority: Critical > > Environment is latest branch-2 head > OS: RHEL 7.4 > *Observation* > Out of ~10 container allocations with GPU requirement, at least 1 of the > allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I > could still have visibility to all GPUs on the same machine when running > nvidia-smi. > The funny thing is even though I have visibility to all GPUs at the moment of > executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the > process's access to only that single GPU after sometime. > The underlying process trying to access GPU would take the initial > information as source of truth and try to access physical 0 GPU which is not > really available to the process. This results in a > [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error. > Validated the container-executor commands are correct: > {code:java} > PrivilegedOperationExecutor command: > [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, > --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, > 0,1,2,3] > PrivilegedOperationExecutor command: > [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, > application_1549663278916_0249, > /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, > /grid/a/tmp/yarn, /grid/a/tmp/userlogs, > /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer, > khu, application_1549663278916_0249, > container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, > 8040, /grid/a/tmp/yarn] > {code} > So most likely a race condition between these two operations? > cc [~jhung] > Another potential theory is the cgroups creation for the container actually > failed but the error was swallowed silently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766890#comment-16766890 ] Zhaohui Xin commented on YARN-9277: --- Hi, [~Steven Rand], thanks for your reply. If one long-running task is preempted, It's next attempt will run long time similarly. If this attempt is also be preempted, this job will be difficult to finish. Also, I think it's not reasonable to limit long-running apps in specific queues, which is not generic. Maybe we have a better solution? > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766880#comment-16766880 ] Zhaohui Xin commented on YARN-9277: --- Hi, [~wilfreds], you can see issue [YARN-8061|https://issues.apache.org/jira/browse/YARN-8061]: An application may preempt itself in case of minshare preemption. In my opinion, even if this will not happen, we should also add this sanity check. > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource
[ https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766878#comment-16766878 ] Hadoop QA commented on YARN-1655: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 36s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 213 unchanged - 0 fixed = 215 total (was 213) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 17s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}138m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.constraint.TestPlacementConstraintsUtil | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-1655 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958509/YARN-1655.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c176f9f1c3c7 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7b11b40 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/23392/artifact/out/diff-checkstyle-hadoop-yarn-pr
[jira] [Assigned] (YARN-8061) An application may preempt itself in case of minshare preemption
[ https://issues.apache.org/jira/browse/YARN-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin reassigned YARN-8061: - Assignee: Zhaohui Xin > An application may preempt itself in case of minshare preemption > > > Key: YARN-8061 > URL: https://issues.apache.org/jira/browse/YARN-8061 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.0, 2.8.3, 3.0.0 >Reporter: Yufei Gu >Assignee: Zhaohui Xin >Priority: Major > > Assume a leaf queue A's minshare is 10G memory and fairshare is 12G. It used > 4G, so its minshare-staved resources is 6G and will be distributed to all its > apps. Assume there are 4 apps a1, a2, a3, a4 inside, who demand 3G, 2G, 1G, > and 0.5G. a1 gets 3G minshare-starved resources, a2 gets 2G, a3 get 1G, they > are all considered as starved apps except a4 who doesn't get any. > An app can preempt another under the same queue due to minshare starvation. > For example, a1 can preempt a4 if a4 uses more resources than its fair share, > which is 3G(12G/4). If a1 itself used more than 3G memory, it will preempt > itself! I will create a unit test later. > The solution would check application's fair share while distributing minshare > starvation, more details in method > {{FSLeafQueue#updateStarvedAppsMinshare()}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766880#comment-16766880 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 7:22 AM: Hi, [~wilfreds], you can see issue YARN-8061: An application may preempt itself in case of minshare preemption. In my opinion, even if this will not happen, we should also add this as a sanity check. was (Author: uranus): Hi, [~wilfreds], you can see issue [YARN-8061|https://issues.apache.org/jira/browse/YARN-8061]: An application may preempt itself in case of minshare preemption. In my opinion, even if this will not happen, we should also add this sanity check. > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9208) Distributed shell allow LocalResourceVisibility to be specified
[ https://issues.apache.org/jira/browse/YARN-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766879#comment-16766879 ] Hadoop QA commented on YARN-9208: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 37s{color} | {color:green} hadoop-yarn-applications-distributedshell in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 48s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9208 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958513/YARN-9208-004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux daccc8c33c2b 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 917ac9f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23394/testReport/ | | Max. process+thread count | 642 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23394/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated
[jira] [Updated] (YARN-9286) Timeline Server sorting based on FinalStatus throws pop-up message
[ https://issues.apache.org/jira/browse/YARN-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nallasivan updated YARN-9286: - Summary: Timeline Server sorting based on FinalStatus throws pop-up message (was: Timeline Server(1.5) GUI, sorting based on FinalStatus throws pop-up message) > Timeline Server sorting based on FinalStatus throws pop-up message > -- > > Key: YARN-9286 > URL: https://issues.apache.org/jira/browse/YARN-9286 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Nallasivan >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-9286-001.patch > > > In Timeline Server GUI, if we try to sort the details based on FinalStatus, a > popup window is getting displayed. And further any operations which involves > the refreshing of the page, results in the display of same popup window. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9286) [Timeline Server] Sorting based on FinalStatus throws pop-up message
[ https://issues.apache.org/jira/browse/YARN-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nallasivan updated YARN-9286: - Summary: [Timeline Server] Sorting based on FinalStatus throws pop-up message (was: Timeline Server sorting based on FinalStatus throws pop-up message) > [Timeline Server] Sorting based on FinalStatus throws pop-up message > > > Key: YARN-9286 > URL: https://issues.apache.org/jira/browse/YARN-9286 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Nallasivan >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-9286-001.patch > > > In Timeline Server GUI, if we try to sort the details based on FinalStatus, a > popup window is getting displayed. And further any operations which involves > the refreshing of the page, results in the display of same popup window. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766813#comment-16766813 ] Wilfred Spiegelenburg commented on YARN-8967: - After talking off line with a number of people the request was to divide this change into two parts due to its size: * _part 1_ for the new rules and changes to the existing PlacementRule code * _part 2_ for the FS changes and integration It is the only way that the change can be split and make them compile separately. A new jira YARN-9298 is open for _part 1_ and we'll keep this jira for _part 2_. Removing patch available until that one is checked in. It will also allow work to start on enhancing the rules with filters etc which have existing open jiras. > Change FairScheduler to use PlacementRule interface > --- > > Key: YARN-8967 > URL: https://issues.apache.org/jira/browse/YARN-8967 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-8967.001.patch, YARN-8967.002.patch, > YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch > > > The PlacementRule interface was introduced to be used by all schedulers as > per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not > and is using its own rule definition. > YARN-8948 cleans up the implementation and removes the CS references which > should allow this change to go through. > This would be the first step in using one placement rule engine for both > schedulers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766811#comment-16766811 ] Wilfred Spiegelenburg commented on YARN-9277: - I agree with [~Steven Rand] sorting could be good but setting a hard no go could cause issues. Can you also explain how we can pre-empt a container that is owned by the application itself? I thought that we would only allow containers to be pre-empted if the application is over its fair share and even then only if pre-empting the container would not drop the application below its fair share. The {{FSPreemptionThread.identifyContainersToPreemptOnNode()}} calls {{app.canContainerBePreempted()}} which contains that check and the container is not added. Since the app we are pre-empting for is under its fair share any container of the app itself should be filtered out by that. Am I reading this all wrong or have you found cases that we did pre-empt a container for its own app and it is not working as expected? > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9208) Distributed shell allow LocalResourceVisibility to be specified
[ https://issues.apache.org/jira/browse/YARN-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9208: Attachment: YARN-9208-004.patch > Distributed shell allow LocalResourceVisibility to be specified > --- > > Key: YARN-9208 > URL: https://issues.apache.org/jira/browse/YARN-9208 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Prabhu Joseph >Priority: Minor > Attachments: YARN-9208-001.patch, YARN-9208-002.patch, > YARN-9208-003.patch, YARN-9208-004.patch > > > YARN-9008 add feature to add list of files to be localized. > Would be great to have Visibility type too. Allows testing of PRIVATE and > PUBLIC type too -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YARN-9298: Attachment: YARN-9298.001.patch > Implement FS placement rules using PlacementRule interface > -- > > Key: YARN-9298 > URL: https://issues.apache.org/jira/browse/YARN-9298 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-9298.001.patch > > > Implement existing placement rules of the FS using the PlacementRule > interface. > Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766797#comment-16766797 ] Hadoop QA commented on YARN-999: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 5s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 25s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 8 new + 248 unchanged - 6 fixed = 256 total (was 254) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 1s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 47s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}176m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958494/YARN-291.000.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8f91358e17e8 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3dc2523 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/Pre
[jira] [Updated] (YARN-9296) [Timeline Server] FinalStatus is displayed wrong for killed and failed applications
[ https://issues.apache.org/jira/browse/YARN-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-9296: --- Summary: [Timeline Server] FinalStatus is displayed wrong for killed and failed applications (was: Timeline Server FinalStatus is displayed wrong for killed and failed applications) > [Timeline Server] FinalStatus is displayed wrong for killed and failed > applications > --- > > Key: YARN-9296 > URL: https://issues.apache.org/jira/browse/YARN-9296 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Nallasivan >Priority: Minor > > Timline Server(1.5), FinalStatus of the applications which are killed and > failed, is displayed as UNDEFINED in both GUI, REST API -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9296) Timeline Server FinalStatus is displayed wrong for killed and failed applications
[ https://issues.apache.org/jira/browse/YARN-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-9296: --- Summary: Timeline Server FinalStatus is displayed wrong for killed and failed applications (was: In Timline Server(1.5), FinalStatus is displayed wrong for killed and failed applications) > Timeline Server FinalStatus is displayed wrong for killed and failed > applications > - > > Key: YARN-9296 > URL: https://issues.apache.org/jira/browse/YARN-9296 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Nallasivan >Priority: Minor > > Timline Server(1.5), FinalStatus of the applications which are killed and > failed, is displayed as UNDEFINED in both GUI, REST API -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9296) In Timline Server(1.5), FinalStatus is displayed wrong for killed and failed applications
[ https://issues.apache.org/jira/browse/YARN-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nallasivan updated YARN-9296: - Description: Timline Server(1.5), FinalStatus of the applications which are killed and failed, is displayed as UNDEFINED in both GUI, REST API (was: In Timline Server(1.5), FinalStatus of the applications which are killed and failed, is displayed as UNDEFINED in both GUI, REST API) > In Timline Server(1.5), FinalStatus is displayed wrong for killed and failed > applications > - > > Key: YARN-9296 > URL: https://issues.apache.org/jira/browse/YARN-9296 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Nallasivan >Priority: Minor > > Timline Server(1.5), FinalStatus of the applications which are killed and > failed, is displayed as UNDEFINED in both GUI, REST API -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766783#comment-16766783 ] Steven Rand commented on YARN-9277: --- {code} +// We should not preempt container which has been running for a long time. +if ((System.currentTimeMillis() - container.getCreationTime()) >= +getQueue().getFSContext().getPreemptionConfig() +.getToBePreemptedContainerRuntimeThreshold()) { + logPreemptContainerPreCheckInfo( + "this container already run a long time!"); + return false; +} + {code} I disagree with this because it allows for situations in which starved applications can't preempt applications that are over their fair shares. If application A is starved and application B is over its fair share, but happens to have all its containers running for more than the threshold, then application A is unable to preempt and will remain starved. It might be reasonable to sort preemptable containers by runtime and preempt those that have started most recently. However, I worry that this unfairly biases the scheduler against applications with shorter-lived tasks. If code can't be optimized, and really does require very long-running tasks, then these jobs can be run in a queue from which preemption isn't allowed via the {{allowPreemptionFrom}} property. > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9298) Implement FS placement rules using PlacementRule interface
Wilfred Spiegelenburg created YARN-9298: --- Summary: Implement FS placement rules using PlacementRule interface Key: YARN-9298 URL: https://issues.apache.org/jira/browse/YARN-9298 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Implement existing placement rules of the FS using the PlacementRule interface. Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource
[ https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766771#comment-16766771 ] Wilfred Spiegelenburg commented on YARN-1655: - Updated test to make it more robust. locally ran all new tests 250 times have not seen a failure. > Add implementations to FairScheduler to support increase/decrease container > resource > > > Key: YARN-1655 > URL: https://issues.apache.org/jira/browse/YARN-1655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-1655.001.patch, YARN-1655.002.patch, > YARN-1655.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource
[ https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg updated YARN-1655: Attachment: YARN-1655.003.patch > Add implementations to FairScheduler to support increase/decrease container > resource > > > Key: YARN-1655 > URL: https://issues.apache.org/jira/browse/YARN-1655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-1655.001.patch, YARN-1655.002.patch, > YARN-1655.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup
[ https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766760#comment-16766760 ] Zhankun Tang edited comment on YARN-9294 at 2/13/19 4:16 AM: - [~oliverhuh...@gmail.com] , Yeah. Agree with the plan. And to find the stable reproducing steps seems a good start to me. We can write a script to create sub device cgroups and use "container-executor" to set parameter and then attach the running processes to verify if anyone sees the denied devices. was (Author: tangzhankun): [~oliverhuh...@gmail.com] , Yeah. Agree with the plan. And to find the stable reproducing steps seems a good start to me. We can write a script to create sub device cgroups and use "container-executor" to set parameter and then attach the processes repeatedly to verify if someone sees denied devices. > Potential race condition in setting GPU cgroups & execute command in the > selected cgroup > > > Key: YARN-9294 > URL: https://issues.apache.org/jira/browse/YARN-9294 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.10.0 >Reporter: Keqiu Hu >Assignee: Keqiu Hu >Priority: Critical > > Environment is latest branch-2 head > OS: RHEL 7.4 > *Observation* > Out of ~10 container allocations with GPU requirement, at least 1 of the > allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I > could still have visibility to all GPUs on the same machine when running > nvidia-smi. > The funny thing is even though I have visibility to all GPUs at the moment of > executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the > process's access to only that single GPU after sometime. > The underlying process trying to access GPU would take the initial > information as source of truth and try to access physical 0 GPU which is not > really available to the process. This results in a > [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error. > Validated the container-executor commands are correct: > {code:java} > PrivilegedOperationExecutor command: > [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, > --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, > 0,1,2,3] > PrivilegedOperationExecutor command: > [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, > application_1549663278916_0249, > /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, > /grid/a/tmp/yarn, /grid/a/tmp/userlogs, > /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer, > khu, application_1549663278916_0249, > container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, > 8040, /grid/a/tmp/yarn] > {code} > So most likely a race condition between these two operations? > cc [~jhung] > Another potential theory is the cgroups creation for the container actually > failed but the error was swallowed silently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup
[ https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766760#comment-16766760 ] Zhankun Tang commented on YARN-9294: [~oliverhuh...@gmail.com] , Yeah. Agree with the plan. And to find the stable reproducing steps seems a good start to me. We can write a script to create sub device cgroups and use "container-executor" to set parameter and then attach the processes repeatedly to verify if someone sees denied devices. > Potential race condition in setting GPU cgroups & execute command in the > selected cgroup > > > Key: YARN-9294 > URL: https://issues.apache.org/jira/browse/YARN-9294 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.10.0 >Reporter: Keqiu Hu >Assignee: Keqiu Hu >Priority: Critical > > Environment is latest branch-2 head > OS: RHEL 7.4 > *Observation* > Out of ~10 container allocations with GPU requirement, at least 1 of the > allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I > could still have visibility to all GPUs on the same machine when running > nvidia-smi. > The funny thing is even though I have visibility to all GPUs at the moment of > executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the > process's access to only that single GPU after sometime. > The underlying process trying to access GPU would take the initial > information as source of truth and try to access physical 0 GPU which is not > really available to the process. This results in a > [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error. > Validated the container-executor commands are correct: > {code:java} > PrivilegedOperationExecutor command: > [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, > --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, > 0,1,2,3] > PrivilegedOperationExecutor command: > [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, > application_1549663278916_0249, > /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, > /grid/a/tmp/yarn, /grid/a/tmp/userlogs, > /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer, > khu, application_1549663278916_0249, > container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, > 8040, /grid/a/tmp/yarn] > {code} > So most likely a race condition between these two operations? > cc [~jhung] > Another potential theory is the cgroups creation for the container actually > failed but the error was swallowed silently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9118) Handle issues with parsing user defined GPU devices in GpuDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766750#comment-16766750 ] Zhankun Tang commented on YARN-9118: [~snemeth] , thanks for the patch! Please fix the check style issues. [~sunilg] , the latest patch looks good to me. > Handle issues with parsing user defined GPU devices in GpuDiscoverer > > > Key: YARN-9118 > URL: https://issues.apache.org/jira/browse/YARN-9118 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: YARN-9118.001.patch, YARN-9118.002.patch, > YARN-9118.003.patch, YARN-9118.004.patch, YARN-9118.005.patch, > YARN-9118.006.patch, YARN-9118.007.patch > > > getGpusUsableByYarn has the following issues: > - Duplicate GPU device definitions are not denied: This seems to be the > biggest issue as it could increase the number of devices on the node if the > device ID is defined 2 or more times. > - An empty-string is accepted, it works like the user would not want to use > auto-discovery and haven't defined any GPU devices: This will result in an > empty device list, but the empty-string check is never explicitly there in > the code, so this behavior just coincidental. > - Number validation does not happen on GPU device IDs (separated by commas) > Many testcases are added as the coverage was already very low. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card
[ https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766737#comment-16766737 ] Zhankun Tang commented on YARN-9265: [~pbacsko] , Thanks for the patch. It looks good to me. > FPGA plugin fails to recognize Intel Processing Accelerator Card > > > Key: YARN-9265 > URL: https://issues.apache.org/jira/browse/YARN-9265 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Attachments: YARN-9265-001.patch, YARN-9265-002.patch > > > The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card). > There are two major issues. > Problem #1 > The output of aocl diagnose: > {noformat} > > Device Name: > acl0 > > Package Pat: > /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp > > Vendor: Intel Corp > > Physical Dev Name StatusInformation > > pac_a10_f20 PassedPAC Arria 10 Platform (pac_a10_f20) > PCIe 08:00.0 > FPGA temperature = 79 degrees C. > > DIAGNOSTIC_PASSED > > > Call "aocl diagnose " to run diagnose for specified devices > Call "aocl diagnose all" to run diagnose for all devices > {noformat} > The plugin fails to recognize this and fails with the following message: > {noformat} > 2019-01-25 06:46:02,834 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin: > Using FPGA vendor plugin: > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin > 2019-01-25 06:46:02,943 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer: > Trying to diagnose FPGA information ... > 2019-01-25 06:46:03,085 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule: > Using traffic control bandwidth handler > 2019-01-25 06:46:03,108 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: > Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn > 2019-01-25 06:46:03,139 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl: > FPGA Plugin bootstrap success. > 2019-01-25 06:46:03,247 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: > Couldn't find (?i)bus:slot.func\s=\s.*, pattern > 2019-01-25 06:46:03,248 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: > Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern > 2019-01-25 06:46:03,251 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: > Failed to get major-minor number from reading /dev/pac_a10_f30 > 2019-01-25 06:46:03,252 ERROR > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to > bootstrap configured resource subsystems! > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: > No FPGA devices detected! > {noformat} > Problem #2 > The plugin assumes that the file name under {{/dev}} can be derived from the > "Physical Dev Name", but this is wrong. For example, it thinks that the > device file is {{/dev/pac_a10_f30}} which is not the case, the actual > file is {{/dev/intel-fpga-port.0}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:17 AM: Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority in FairScheduler currently. So this restriction will be valid only after YARN-2098, I will remove this restriction soon afterwards. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} {quote} We should not preempt container which has been running for a long time. {quote} I think this is a import restriction. *Because it's very costly to kill one task which has been running with dozens of hours.* was (Author: uranus): Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority in FairScheduler currently. So this restriction is invalid in community version. This will be valid after YARN-2098. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} {quote} We should not preempt container which has been running for a long time. {quote} I think this is a import restriction. *Because it's very costly to kill one task which has been running with dozens of hours.* > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:14 AM: Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority in FairScheduler currently. So this restriction is invalid in community version. This will be valid after YARN-2098. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} {quote} We should not preempt container which has been running for a long time. {quote} I think this is a import restriction. *Because it's very costly to kill one task which has been running with dozens of hours.* was (Author: uranus): Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority in FairScheduler currently. So this restriction is invalid in community version. This will be valid after [YARN-2098|https://issues.apache.org/jira/browse/YARN-2098]. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:09 AM: Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority in FairScheduler currently. So this restriction is invalid in community version. This will be valid after [YARN-2098|https://issues.apache.org/jira/browse/YARN-2098]. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} was (Author: uranus): Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority currently. So this restriction is invalid in community version. BTW, we honored app's priority from _ApplicationSubmissionContext_ in our cluster. I think the community should also change like this, but this is another problem. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766696#comment-16766696 ] Zhankun Tang edited comment on YARN-8927 at 2/13/19 3:03 AM: - [~ebadger] Just checked, if an image name is "repoA/userA/imageA", configure either "repoA" or "repoA/userA" in "docker.trusted.registries" can both works. Also configure "repoA/userA/prefixA" or "repoA/userA" or "repoA" can all allow the image name "repoA/userA/refixA/imageA". So it seems it doesn't need explicit logic to allow such named images? was (Author: tangzhankun): [~ebadger] Just checked, if an image name is "repoA/userA/imageA", configure either "repoA" or "repoA/userA" in "docker.trusted.registries" can both works. So it seems it doesn't need explicit logic to allow such named images? > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766696#comment-16766696 ] Zhankun Tang edited comment on YARN-8927 at 2/13/19 2:52 AM: - [~ebadger] Just checked, if an image name is "repoA/userA/imageA", configure either "repoA" or "repoA/userA" in "docker.trusted.registries" can both works. So it seems it doesn't need explicit logic to allow such named images? was (Author: tangzhankun): [~ebadger] Just checked, if an image name is "repoA/userA/imageA", configure "repoA" and "repoA/userA" in "docker.trusted.registries" can both works. So it seems it doesn't need explicit logic to allow such named images? > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 2:58 AM: Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority currently. So this restriction is invalid in community version. BTW, we honored app's priority from _ApplicationSubmissionContext_ in our cluster. I think the community should also change like this, but this is another problem. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} was (Author: uranus): Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority currently. So this restriction is invalid in community version. BTW, we honored app's priority from _ApplicationSubmissionContext_ in our cluster. I think the community should also change like this, but this is another problem. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694 ] Zhaohui Xin edited comment on YARN-9277 at 2/13/19 2:57 AM: Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} You are right. Yarn jobs have the same priority currently. So this restriction is invalid in community version. BTW, we honored app's priority from _ApplicationSubmissionContext_ in our cluster. I think the community should also change like this, but this is another problem. {code:java} public Priority getPriority() { // Right now per-app priorities are not passed to scheduler, // so everyone has the same priority. return appPriority; }{code} was (Author: uranus): Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766696#comment-16766696 ] Zhankun Tang edited comment on YARN-8927 at 2/13/19 2:52 AM: - [~ebadger] Just checked, if an image name is "repoA/userA/imageA", configure "repoA" and "repoA/userA" in "docker.trusted.registries" can both works. So it seems it doesn't need explicit logic to allow such named images? was (Author: tangzhankun): Just checked, if an image name is "repoA/userA/imageA", configure "repoA" and "repoA/userA" in "docker.trusted.registries" can both works. So it seems it doesn't need explicit logic to allow such named images? > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766696#comment-16766696 ] Zhankun Tang commented on YARN-8927: Just checked, if an image name is "repoA/userA/imageA", configure "repoA" and "repoA/userA" in "docker.trusted.registries" can both works. So it seems it doesn't need explicit logic to allow such named images? > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694 ] Zhaohui Xin commented on YARN-9277: --- Hi, [~yufeigu]. Thanks for your reply. {quote}Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. {quote} > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766687#comment-16766687 ] Íñigo Goiri commented on YARN-999: -- Just to make sure we are in the same page, I added [^YARN-291.000.patch] with a WIP. This basically shows a unit test that makes sure we get the expected behavior. In the scheduler side, I did a very hacky approach for preemption just for the unit test. Trying to figure out the best way to do the preemption following events or so. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Priority: Major > Attachments: YARN-291.000.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be hired here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766686#comment-16766686 ] Junping Du commented on YARN-999: - bq. I am not sure how exactly the reduction of node resources is implemented, but for the opportunistic containers, you can kill stuff locally at the NMs. So if you need to free up resources due to resource reduction, you can go over the opportunistic containers running and kill the long-running ones. So far, the reduction of node resources won't kill any containers but wait until container get finished - quite old behavior as no long running service support when feature get implemented for the first time. I think we need a generic policy here that can pick up containers to balloon out resources according to some cost - opportunistic/guaranteed could be one dimension but could count others - container size, running time, etc. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Priority: Major > Attachments: YARN-291.000.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be hired here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-999: - Attachment: YARN-291.000.patch > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Priority: Major > Attachments: YARN-291.000.patch > > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be hired here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636 ] Zhankun Tang edited comment on YARN-8927 at 2/13/19 1:39 AM: - [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she should configure "repoA/userA" in the "docker.trusted.registries"? I will try if this works and get back to you. And one thing worthing noting is that if YARN allows an image name, then Docker will check if it's local and prefer to run it before pulling from a hub. YARN's checking logic here seems duplicated work because if Docker can pull it and run. We can hardly say this "repoA/userA/imageA" is a real local image. was (Author: tangzhankun): [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she should configure "repoA/userA" in the "docker.trusted.registries"? I will try if this works and get back to you. And one thing worthing noting is that if YARN allow an image name, then Docker will check if it's local and prefer to run it. > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636 ] Zhankun Tang edited comment on YARN-8927 at 2/13/19 1:31 AM: - [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she should configure "repoA/userA" in the "docker.trusted.registries"? I will try if this works and get back to you. was (Author: tangzhankun): [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she should configure "userA" in the "docker.trusted.registries"? And after "userA" is configured, Docker will check if that image is local or not. If local, the container will run. If not, the container will fail if not in docker hub. > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636 ] Zhankun Tang edited comment on YARN-8927 at 2/13/19 1:28 AM: - [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she should configure "userA" in the "docker.trusted.registries"? And after "userA" is configured, Docker will check if that image is local or not. If local, the container will run. If not, the container will fail if not in docker hub. was (Author: tangzhankun): [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she should configure "userA" in the "docker.trusted.registries"? > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636 ] Zhankun Tang edited comment on YARN-8927 at 2/13/19 1:35 AM: - [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she should configure "repoA/userA" in the "docker.trusted.registries"? I will try if this works and get back to you. And one thing worthing noting is that if YARN allow an image name, then Docker will check if it's local and prefer to run it. was (Author: tangzhankun): [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she should configure "repoA/userA" in the "docker.trusted.registries"? I will try if this works and get back to you. > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636 ] Zhankun Tang edited comment on YARN-8927 at 2/13/19 1:25 AM: - [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she should configure "userA" in the "docker.trusted.registries"? was (Author: tangzhankun): [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she should configure "user1" in the "docker.trusted.registries"? > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636 ] Zhankun Tang commented on YARN-8927: [~eyang], [~ebadger] Thanks for the review! If a local image name contains "/" in it, it may not be considered as "top-level" image. It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she should configure "user1" in the "docker.trusted.registries"? > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9297) Renaming RM could cause application to crash
[ https://issues.apache.org/jira/browse/YARN-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu resolved YARN-9297. Resolution: Duplicate > Renaming RM could cause application to crash > > > Key: YARN-9297 > URL: https://issues.apache.org/jira/browse/YARN-9297 > Project: Hadoop YARN > Issue Type: Improvement > Components: security >Affects Versions: 2.6.0 >Reporter: Aihua Xu >Priority: Major > > In this line, we are throwing UnknownHostException when any RM host can't > resolve to ip address. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448 > There are some cases that one RM needs to rename or map to different ip > address, then it will crash the application although other RMs are running > fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9297) Renaming RM could cause application to crash
[ https://issues.apache.org/jira/browse/YARN-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766618#comment-16766618 ] Aihua Xu commented on YARN-9297: Yes. You are right. I will resolve as dup. Thanks [~jojochuang] > Renaming RM could cause application to crash > > > Key: YARN-9297 > URL: https://issues.apache.org/jira/browse/YARN-9297 > Project: Hadoop YARN > Issue Type: Improvement > Components: security >Affects Versions: 2.6.0 >Reporter: Aihua Xu >Priority: Major > > In this line, we are throwing UnknownHostException when any RM host can't > resolve to ip address. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448 > There are some cases that one RM needs to rename or map to different ip > address, then it will crash the application although other RMs are running > fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9273) Flexing a component of YARN service does not work as documented when using relative number
[ https://issues.apache.org/jira/browse/YARN-9273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766603#comment-16766603 ] Masahiro Tanaka commented on YARN-9273: --- Hi [~eyang], could you review this? > Flexing a component of YARN service does not work as documented when using > relative number > -- > > Key: YARN-9273 > URL: https://issues.apache.org/jira/browse/YARN-9273 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Masahiro Tanaka >Assignee: Masahiro Tanaka >Priority: Minor > Attachments: YARN-9273.001.patch, YARN-9273.002.patch, > YARN-9273.003.patch, YARN-9273.004.patch > > > [The > documents|https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html] > says, > "Relative changes are also supported for the ${NUMBER_OF_CONTAINERS} in the > flex command, such as +2 or -2." when you want to flex a component of a YARN > service. > I expected that {{yarn app -flex sleeper-service -component sleeper +1}} > increments the number of container, but actually it sets the number of > container to just one. > I guess ApiServiceClient#actionFlex treats flexing when executing the {{yarn > app -flex}}, and it just uses {{Long.parseLong}} to convert the argument like > {{+1}}, which doesn't care relative numbers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9297) Renaming RM could cause application to crash
[ https://issues.apache.org/jira/browse/YARN-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766578#comment-16766578 ] Wei-Chiu Chuang commented on YARN-9297: --- Similar to HADOOP-15864? > Renaming RM could cause application to crash > > > Key: YARN-9297 > URL: https://issues.apache.org/jira/browse/YARN-9297 > Project: Hadoop YARN > Issue Type: Improvement > Components: security >Affects Versions: 2.6.0 >Reporter: Aihua Xu >Priority: Major > > In this line, we are throwing UnknownHostException when any RM host can't > resolve to ip address. > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448 > There are some cases that one RM needs to rename or map to different ip > address, then it will crash the application although other RMs are running > fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9297) Renaming RM could cause application to crash
Aihua Xu created YARN-9297: -- Summary: Renaming RM could cause application to crash Key: YARN-9297 URL: https://issues.apache.org/jira/browse/YARN-9297 Project: Hadoop YARN Issue Type: Improvement Components: security Affects Versions: 2.6.0 Reporter: Aihua Xu In this line, we are throwing UnknownHostException when any RM host can't resolve to ip address. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448 There are some cases that one RM needs to rename or map to different ip address, then it will crash the application although other RMs are running fine. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7129) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766542#comment-16766542 ] Hadoop QA commented on YARN-7129: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 16 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 10s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 36s{color} | {color:orange} root: The patch generated 9 new + 4 unchanged - 0 fixed = 13 total (was 4) {color} | | {color:green}+1{color} | {color:green} hadolint {color} | {color:green} 0m 1s{color} | {color:green} There were no new hadolint issues. {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 14m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 1s{color} | {color:green} There were no new shellcheck issues. {color} | | {color:orange}-0{color} | {color:orange} shelldocs {color} | {color:orange} 0m 15s{color} | {color:orange} The patch generated 160 new + 104 unchanged - 0 fixed = 264 total (was 104) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 14s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 44s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-docker hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 41s{color} | {color:green} the patch passed {color} | ||
[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766493#comment-16766493 ] Eric Yang commented on YARN-8927: - {quote}If we see library/ in container-executor.cfg then we trust all local images.{quote} I am not sure how to identify if a image is local, if image contains '/' character. I think patch 002 will break [~ebadger]'s environment since local image names have '/' character in it. [~tangzhankun] any idea on how to fix this? > Support trust top-level image like "centos" when "library" is configured in > "docker.trusted.registries" > --- > > Key: YARN-8927 > URL: https://issues.apache.org/jira/browse/YARN-8927 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Labels: Docker > Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch > > > There are some missing cases that we need to catch when handling > "docker.trusted.registries". > The container-executor.cfg configuration is as follows: > {code:java} > docker.trusted.registries=tangzhankun,ubuntu,centos{code} > It works if run DistrubutedShell with "tangzhankun/tensorflow" > {code:java} > "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow > {code} > But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" > and "ubuntu[:tagName]" fails: > The error message is like: > {code:java} > "image: centos is not trusted" > {code} > We need better handling the above cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9184) Docker run doesn't pull down latest image if the image exists locally
[ https://issues.apache.org/jira/browse/YARN-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766484#comment-16766484 ] Hudson commented on YARN-9184: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15941 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15941/]) YARN-9184. Add a system flag to allow update to latest docker images. (eyang: rev 3dc252326693170ac1b31bf2914bae72ca73d31a) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java > Docker run doesn't pull down latest image if the image exists locally > -- > > Key: YARN-9184 > URL: https://issues.apache.org/jira/browse/YARN-9184 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.1.0, 3.0.3 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9184.001.patch, YARN-9184.002.patch, > YARN-9184.003.patch, YARN-9184.004.patch, YARN-9184.005.patch > > > See [docker run doesn't pull down latest image if the image exists > locally|https://github.com/moby/moby/issues/13331]. > So, I think we should pull image before run to make image always latest. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8955) Add a flag to use local docker image instead of getting latest from registry
[ https://issues.apache.org/jira/browse/YARN-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang resolved YARN-8955. - Resolution: Duplicate Fix Version/s: 3.3.0 This issue is duplicate of YARN-9184 and YARN-9292 combined. Close as duplicates. > Add a flag to use local docker image instead of getting latest from registry > > > Key: YARN-8955 > URL: https://issues.apache.org/jira/browse/YARN-8955 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Yang >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Fix For: 3.3.0 > > > Some companies have security policy to use local docker images instead of > getting latest images from internet. When docker image is pulled in > localization phase, there are two possible out comes. The image is getting > the latest from trusted registries, or the image is a static local copy. > This task is to add a configuration flag to give priority to local image over > trusted registry image. > If a image already exist locally, node manager does not trigger docker pull > to get the latest image from trusted registries. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9184) Docker run doesn't pull down latest image if the image exists locally
[ https://issues.apache.org/jira/browse/YARN-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766470#comment-16766470 ] Eric Yang commented on YARN-9184: - +1 Patch 5 looks good to me. Committing shortly. > Docker run doesn't pull down latest image if the image exists locally > -- > > Key: YARN-9184 > URL: https://issues.apache.org/jira/browse/YARN-9184 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 3.1.0, 3.0.3 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9184.001.patch, YARN-9184.002.patch, > YARN-9184.003.patch, YARN-9184.004.patch, YARN-9184.005.patch > > > See [docker run doesn't pull down latest image if the image exists > locally|https://github.com/moby/moby/issues/13331]. > So, I think we should pull image before run to make image always latest. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9208) Distributed shell allow LocalResourceVisibility to be specified
[ https://issues.apache.org/jira/browse/YARN-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766481#comment-16766481 ] Hadoop QA commented on YARN-9208: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 46s{color} | {color:red} hadoop-yarn-applications-distributedshell in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.applications.distributedshell.TestDistributedShell | | | hadoop.yarn.applications.distributedshell.TestDSWithMultipleNodeManager | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9208 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958444/YARN-9208-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 3696b268c2e6 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7806403 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23390/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-distributedshell.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23390/testReport/ | | Max. process+thread count | 583 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/
[jira] [Updated] (YARN-9208) Distributed shell allow LocalResourceVisibility to be specified
[ https://issues.apache.org/jira/browse/YARN-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9208: Attachment: YARN-9208-003.patch > Distributed shell allow LocalResourceVisibility to be specified > --- > > Key: YARN-9208 > URL: https://issues.apache.org/jira/browse/YARN-9208 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Bibin A Chundatt >Assignee: Prabhu Joseph >Priority: Minor > Attachments: YARN-9208-001.patch, YARN-9208-002.patch, > YARN-9208-003.patch > > > YARN-9008 add feature to add list of files to be localized. > Would be great to have Visibility type too. Allows testing of PRIVATE and > PUBLIC type too -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766325#comment-16766325 ] Konstantinos Karanasos commented on YARN-999: - Give a look at YARN-7934 – we had refactored some stuff in preemption for the federation code (for the glabal queues in particular). The umbrella Jira is not finished, but I think this Jira will point you to some useful classes. I am not sure how exactly the reduction of node resources is implemented, but for the opportunistic containers, you can kill stuff locally at the NMs. So if you need to free up resources due to resource reduction, you can go over the opportunistic containers running and kill the long-running ones). As far as I remember, the regular preemption code in the RM will not touch opportunistic containers. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Priority: Major > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be hired here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766325#comment-16766325 ] Konstantinos Karanasos edited comment on YARN-999 at 2/12/19 6:53 PM: -- Give a look at YARN-7934 – we had refactored some stuff in preemption for the federation code (for the glabal queues in particular). The umbrella Jira is not finished, but I think this Jira will point you to some useful classes. I am not sure how exactly the reduction of node resources is implemented, but for the opportunistic containers, you can kill stuff locally at the NMs. So if you need to free up resources due to resource reduction, you can go over the opportunistic containers running and kill the long-running ones. As far as I remember, the regular preemption code in the RM will not touch opportunistic containers. was (Author: kkaranasos): Give a look at YARN-7934 – we had refactored some stuff in preemption for the federation code (for the glabal queues in particular). The umbrella Jira is not finished, but I think this Jira will point you to some useful classes. I am not sure how exactly the reduction of node resources is implemented, but for the opportunistic containers, you can kill stuff locally at the NMs. So if you need to free up resources due to resource reduction, you can go over the opportunistic containers running and kill the long-running ones). As far as I remember, the regular preemption code in the RM will not touch opportunistic containers. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Priority: Major > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be hired here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766317#comment-16766317 ] Íñigo Goiri commented on YARN-999: -- {quote} If it is an opportunistic container, it will already be killed fast, so I think you don't need a distinction between guaranteed/opportunistic (you will do preemption only in the guaranteed after the timeout). {quote} Right now there is no plumbing at all so I need to build the whole preemption from scratch. Is there a function in the RM which I can call to adjust containers to the resources? Otherwise, I will need to go over the containers and selecting which ones to kill; in this case I need to do the distinction between guaranteed and opportunistic. I don't think the NM is doing anything here. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Priority: Major > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be hired here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766305#comment-16766305 ] Konstantinos Karanasos commented on YARN-999: - Hi [~elgoiri], if it is an opportunistic container, it will already be killed fast, so I think you don't need a distinction between guaranteed/opportunistic (you will do preemption only in the guaranteed after the timeout). I think [~asuresh] might have worked on something related to preemption recently, but not sure. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Priority: Major > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be hired here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766287#comment-16766287 ] Yufei Gu commented on YARN-9277: Hi [~uranus], some general comments, I haven't looked at the code yet. bq. We should not preempt self +1 bq. We should not preempt high priority job. Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. bq. We should not preempt container which has been running for a long time. Makes sense if all other conditions are exactly the same. > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766277#comment-16766277 ] Hadoop QA commented on YARN-9277: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 36s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 6 new + 48 unchanged - 0 fixed = 54 total (was 48) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 8s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 29s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}156m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9277 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958402/YARN-9277.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 6c682c5b8d16 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 20b92cd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/23387/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apac
[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766248#comment-16766248 ] Íñigo Goiri commented on YARN-999: -- Thanks [~djp] for referring to YARN-2489. I'll start working on a generic one and then we decide where to post it. I think the idea would be for the RM to track the moment it got the change in resources and once the timeout passes send {{ContainerPreemptEvent}}. I see this is added in YARN-569 and used in a few places. [~asuresh], [~kkaranasos], I remember you guys had work recently some preemption. Do you guys know what would be a good JIRA to use as a reference for this? Hopefully something that uses distinction of OPPORTUNISTIC containers and others. > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Priority: Major > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be hired here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7129) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-7129: Attachment: YARN-7129.024.patch > Application Catalog for YARN applications > - > > Key: YARN-7129 > URL: https://issues.apache.org/jira/browse/YARN-7129 > Project: Hadoop YARN > Issue Type: New Feature > Components: applications >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-7129.001.patch, > YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, > YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, > YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, > YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, > YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, > YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, > YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, > YARN-7129.023.patch, YARN-7129.024.patch > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7129) Application Catalog for YARN applications
[ https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766225#comment-16766225 ] Eric Yang commented on YARN-7129: - Patch 24 fixes hadolint issue. > Application Catalog for YARN applications > - > > Key: YARN-7129 > URL: https://issues.apache.org/jira/browse/YARN-7129 > Project: Hadoop YARN > Issue Type: New Feature > Components: applications >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN Appstore.pdf, YARN-7129.001.patch, > YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, > YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, > YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, > YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, > YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, > YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, > YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, > YARN-7129.023.patch, YARN-7129.024.patch > > > YARN native services provides web services API to improve usability of > application deployment on Hadoop using collection of docker images. It would > be nice to have an application catalog system which provides an editorial and > search interface for YARN applications. This improves usability of YARN for > manage the life cycle of applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9268) Various fixes are needed in FpgaDevice
[ https://issues.apache.org/jira/browse/YARN-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766218#comment-16766218 ] Hadoop QA commented on YARN-9268: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 134 unchanged - 13 fixed = 134 total (was 147) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 12s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 26s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9268 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958404/YARN-9268-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2c7e20e10578 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 20b92cd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23388/testReport/ | | Max. process+thread count | 446 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23388/console | | Powered by
[jira] [Commented] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup
[ https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766200#comment-16766200 ] Keqiu Hu commented on YARN-9294: Yes, cgget is the old API in RHEL 6 (libcgroup tookit) to get the allocation. It works for memory & CPU, but not for devices. I don't think it is a bug but it seems something not supported to pull device information. I did that manually in my desktop and it works. My current suspicion is this _echo values to cgroup process_ might be flaky sometimes (either file io error, cgroup glitches, race conditions etc). Plan is to have a check to make sure the cgroup manipulation works before moving on to start the process in that cgroup. > Potential race condition in setting GPU cgroups & execute command in the > selected cgroup > > > Key: YARN-9294 > URL: https://issues.apache.org/jira/browse/YARN-9294 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.10.0 >Reporter: Keqiu Hu >Assignee: Keqiu Hu >Priority: Critical > > Environment is latest branch-2 head > OS: RHEL 7.4 > *Observation* > Out of ~10 container allocations with GPU requirement, at least 1 of the > allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I > could still have visibility to all GPUs on the same machine when running > nvidia-smi. > The funny thing is even though I have visibility to all GPUs at the moment of > executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the > process's access to only that single GPU after sometime. > The underlying process trying to access GPU would take the initial > information as source of truth and try to access physical 0 GPU which is not > really available to the process. This results in a > [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error. > Validated the container-executor commands are correct: > {code:java} > PrivilegedOperationExecutor command: > [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, > --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, > 0,1,2,3] > PrivilegedOperationExecutor command: > [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, > application_1549663278916_0249, > /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, > /grid/a/tmp/yarn, /grid/a/tmp/userlogs, > /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer, > khu, application_1549663278916_0249, > container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, > 8040, /grid/a/tmp/yarn] > {code} > So most likely a race condition between these two operations? > cc [~jhung] > Another potential theory is the cgroups creation for the container actually > failed but the error was swallowed silently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer
[ https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766185#comment-16766185 ] Adam Antal commented on YARN-9270: -- Thanks for the patch, [~pbacsko]! Some thought of mine: - If we touch {{IntelFpgaOpenclPlugin.java}}, could we remove the wildcard import {{import java.util.*}}. If I'm not mistaken, we use HashMap, LinkedList, List and Map in that file. (similar to TestFpgaDiscoverer.java). - Removing that dirty hack from {{TestFpgaDiscoverer.java}} is a great plus of this patch, thank you for that! I find the exact same piece of code in SO by searching the keywords (it's a red flag for me), and it's looks really messy, so I am happy if we can get this removed. - I don't see why the constructor of Configuration is called with false, but I can accept that. Also the 5th testcase (testLinuxFpgaResourceDiscoverPluginWithSdkRootSet) uses another Conifiguration object in the original testcase when calling the {{discoverer.initialize(conf)}} (which is initialized with a true parameter) - so you modify the behaviour of the testcase. It doesn't make the test fail, but is it intentional? - We request the instance of the FpgaDiscoverer 5 times, and then call the setResourceHanderPlugin on it with the same parameter (openclPlugin). Can we move it to a helper function to avoid minor code duplication? - We can also move the setting of YarnConfiguration.NM_FPGA_PATH_TO_EXEC config to that function, if we don't modify the 1st test behaviour. - Also could you move the previous comments/description of the test cases to the new tests' javadoc? - As I see it, there aren't any logs defined in this testcase. It is beyond the scope of the issue, but it would be nice to have some debug level logging. For a start it'd be nice to have log for the new tests that you just split. Was happy to review it, good work overall! > Minor cleanup in TestFpgaDiscoverer > --- > > Key: YARN-9270 > URL: https://issues.apache.org/jira/browse/YARN-9270 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9270-001.patch > > > Let's do some cleanup in this class. > * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split > up to 5 different tests, because it tests 5 different scenarios. > * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a > {{Function}} in the plugin class like {{Function envProvider > = System::getenv()}} plus a setter method which allows the test to modify > {{envProvider}}. Much simpler and straightfoward. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9268) Various fixes are needed in FpgaDevice
[ https://issues.apache.org/jira/browse/YARN-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9268: --- Attachment: YARN-9268-003.patch > Various fixes are needed in FpgaDevice > -- > > Key: YARN-9268 > URL: https://issues.apache.org/jira/browse/YARN-9268 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9268-001.patch, YARN-9268-002.patch, > YARN-9268-003.patch > > > Need to fix the following the class {{FpgaDevice}}: > * It implements {{Comparable}}, but returns 0 in every case. There is no > natural ordering among FPGA devices, perhaps "acl0" comes before "acl1", but > this seems too forced and unnecessary.We think this class should not > implement {{Comparable}} at all, at least not like that. > * Stores unnecessary fields: devName, busNum, temperature, power usage. For > one, these are never needed in the code. Secondly, temp and power usage > changes constantly. It's pointless to store these in this POJO. > * {{serialVersionUID}} is 1L - let's generate a number for this > * Use {{int}} instead of {{Integer}} - don't allow nulls. If major/minor > uniquely identifies the card, then let's demand them in the constructor and > don't store Integers that can be null. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766143#comment-16766143 ] Zhaohui Xin commented on YARN-9277: --- Hi, [~yufeigu]. Can you help me review this patch? :D > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin as an example
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766109#comment-16766109 ] Zhankun Tang commented on YARN-9060: [~sunilg] , the failed test cases seems not related. Please review. Thanks. > [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin > as an example > -- > > Key: YARN-9060 > URL: https://issues.apache.org/jira/browse/YARN-9060 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch, > YARN-9060-trunk.003.patch, YARN-9060-trunk.004.patch, > YARN-9060-trunk.005.patch, YARN-9060-trunk.006.patch, > YARN-9060-trunk.007.patch, YARN-9060-trunk.008.patch, > YARN-9060-trunk.009.patch, YARN-9060-trunk.010.patch, > YARN-9060-trunk.011.patch, YARN-9060-trunk.012.patch, > YARN-9060-trunk.013.patch, YARN-9060-trunk.014.patch, > YARN-9060-trunk.015.patch, YARN-9060-trunk.016.patch, > YARN-9060-trunk.017.patch, YARN-9060-trunk.018.patch > > > Due to the cgroups v1 implementation policy in linux kernel, we cannot update > the value of the device cgroups controller unless we have the root permission > ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]). > So we need to support this in container-executor for Java layer to invoke. > This Jira will have three parts: > # native c-e module > # Java layer code to isolate devices for container (docker and non-docker) > # A sample Nvidia GPU plugin -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Attachment: YARN-9277.002.patch > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaohui Xin updated YARN-9277: -- Description: I think we should add more restrictions in fair scheduler preemption. * We should not preempt self * We should not preempt high priority job * We should not preempt container which has been running for a long time. * ... was: I think we should add more restrictions in fair scheduler preemption. * We should not preempt AM container * We should not preempt high priority job * We should not preempt container which has been running for a long time. * ... > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9266) Various fixes are needed in IntelFpgaOpenclPlugin
[ https://issues.apache.org/jira/browse/YARN-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766045#comment-16766045 ] Hadoop QA commented on YARN-9266: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 54 unchanged - 94 fixed = 54 total (was 148) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 35s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9266 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958382/YARN-9266-004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 441a9eb09e39 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 20b92cd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23386/testReport/ | | Max. process+thread count | 413 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23386/console | | Powered by |
[jira] [Commented] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card
[ https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766040#comment-16766040 ] Hadoop QA commented on YARN-9265: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 8m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 4m 1s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 264 unchanged - 6 fixed = 264 total (was 270) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 42s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 34s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 39s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}112m 21s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9265 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958378/YARN-9265-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle xml | | uname | Linux 49c58552fd88 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86
[jira] [Commented] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin as an example
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766012#comment-16766012 ] Hadoop QA commented on YARN-9060: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 48s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 15s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}226m 28s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | | | hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hado
[jira] [Comment Edited] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766008#comment-16766008 ] Zhaohui Xin edited comment on YARN-8655 at 2/12/19 1:09 PM: [~wilfreds], I accidentally discovered this problem in our production cluster about a few months ago. *I think it's enough to satisfy fair share starvation, so I removed min share starvation to fix this problem finally.* I just learned that the community will also abolish min share in the future. After YARN-9066, this issue will no longer be needed. Thanks for your reply. :D was (Author: uranus): [~wilfreds], I accidentally discovered this problem in our production cluster about a few months ago. *I think it's enough to satisfy fair share starvation, so I removed min share starvation to fix this problem finally.* I just learned that the community will also abolish this in the future. After [YARN-9066|https://issues.apache.org/jira/browse/YARN-9066], this issue will no longer be needed. Thanks for your reply. :D > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.002.patch, YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is *fair share starved*, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is *starved by min share*, so this app > is added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766008#comment-16766008 ] Zhaohui Xin commented on YARN-8655: --- [~wilfreds], I accidentally discovered this problem in our production cluster about a few months ago. *I think it's enough to satisfy fair share starvation, so I removed min share starvation to fix this problem finally.* I just learned that the community will also abolish this in the future. After [YARN-9066|https://issues.apache.org/jira/browse/YARN-9066], this issue will no longer be needed. Thanks for your reply. :D > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.002.patch, YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is *fair share starved*, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is *starved by min share*, so this app > is added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9066) Deprecate Fair Scheduler min share
[ https://issues.apache.org/jira/browse/YARN-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766003#comment-16766003 ] Zhaohui Xin commented on YARN-9066: --- [~wilfreds], [~haibochen]. I agree with you very much. It's very complicated to understand min share starvation. After we remove min share starvation, [YARN-8655|https://issues.apache.org/jira/browse/YARN-8655] will no longer be needed. > Deprecate Fair Scheduler min share > -- > > Key: YARN-9066 > URL: https://issues.apache.org/jira/browse/YARN-9066 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0 >Reporter: Haibo Chen >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: Proposal_Deprecate_FS_Min_Share.pdf > > > See the attached docs for details -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9266) Various fixes are needed in IntelFpgaOpenclPlugin
[ https://issues.apache.org/jira/browse/YARN-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9266: --- Attachment: YARN-9266-004.patch > Various fixes are needed in IntelFpgaOpenclPlugin > - > > Key: YARN-9266 > URL: https://issues.apache.org/jira/browse/YARN-9266 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9266-001.patch, YARN-9266-002.patch, > YARN-9266-003.patch, YARN-9266-004.patch > > > Problems identified in this class: > * {{InnerShellExecutor}} ignores the timeout parameter > * {{configureIP()}} uses printStackTrace() instead of logging > * {{configureIP()}} does not log the output of aocl if the exit code != 0 > * {{parseDiagnoseInfo()}} is too heavyweight – it should be in its own class > for better testability > * {{downloadIP()}} uses {{contains()}} for file name check – this can really > surprise users in some cases (eg. you want to use hello.aocx but hello2.aocx > also matches) > * method name {{downloadIP()}} is misleading – it actually tries to finds > the file. Everything is downloaded (localized) at this point. > * {{@VisibleForTesting}} methods should be package private > * {{aliasMap}} is not needed - store the acl number in the {{FpgaDevice}} > class -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9286) Timeline Server(1.5) GUI, sorting based on FinalStatus throws pop-up message
[ https://issues.apache.org/jira/browse/YARN-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765968#comment-16765968 ] Hadoop QA commented on YARN-9286: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 39s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 33s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 42s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9286 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958208/YARN-9286-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 65269be42858 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 20b92cd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23384/testReport/ | | Max. process+thread count | 412 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23384/console | | Powered by | Apache Yetus 0.8.0 http://yetus
[jira] [Created] (YARN-9296) In Timline Server(1.5), FinalStatus is displayed wrong for killed and failed applications
Nallasivan created YARN-9296: Summary: In Timline Server(1.5), FinalStatus is displayed wrong for killed and failed applications Key: YARN-9296 URL: https://issues.apache.org/jira/browse/YARN-9296 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Nallasivan In Timline Server(1.5), FinalStatus of the applications which are killed and failed, is displayed as UNDEFINED in both GUI, REST API -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9295) Fix 'Decomissioned' label typo in Cluster Overview page
[ https://issues.apache.org/jira/browse/YARN-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765964#comment-16765964 ] Bibin A Chundatt commented on YARN-9295: Thank you [~charanh] for patch Looks good to me .. Will get this in today.. > Fix 'Decomissioned' label typo in Cluster Overview page > --- > > Key: YARN-9295 > URL: https://issues.apache.org/jira/browse/YARN-9295 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Reporter: Charan Hebri >Assignee: Charan Hebri >Priority: Trivial > Attachments: Decommissioned-typo.png, YARN-9295.001.patch > > > Change label text from 'Decomissioned' to 'Decommissioned' in Node Managers > section of the Cluster Overview page. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card
[ https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9265: --- Attachment: YARN-9265-002.patch > FPGA plugin fails to recognize Intel Processing Accelerator Card > > > Key: YARN-9265 > URL: https://issues.apache.org/jira/browse/YARN-9265 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Critical > Attachments: YARN-9265-001.patch, YARN-9265-002.patch > > > The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card). > There are two major issues. > Problem #1 > The output of aocl diagnose: > {noformat} > > Device Name: > acl0 > > Package Pat: > /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp > > Vendor: Intel Corp > > Physical Dev Name StatusInformation > > pac_a10_f20 PassedPAC Arria 10 Platform (pac_a10_f20) > PCIe 08:00.0 > FPGA temperature = 79 degrees C. > > DIAGNOSTIC_PASSED > > > Call "aocl diagnose " to run diagnose for specified devices > Call "aocl diagnose all" to run diagnose for all devices > {noformat} > The plugin fails to recognize this and fails with the following message: > {noformat} > 2019-01-25 06:46:02,834 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin: > Using FPGA vendor plugin: > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin > 2019-01-25 06:46:02,943 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer: > Trying to diagnose FPGA information ... > 2019-01-25 06:46:03,085 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule: > Using traffic control bandwidth handler > 2019-01-25 06:46:03,108 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl: > Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn > 2019-01-25 06:46:03,139 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl: > FPGA Plugin bootstrap success. > 2019-01-25 06:46:03,247 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: > Couldn't find (?i)bus:slot.func\s=\s.*, pattern > 2019-01-25 06:46:03,248 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: > Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern > 2019-01-25 06:46:03,251 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin: > Failed to get major-minor number from reading /dev/pac_a10_f30 > 2019-01-25 06:46:03,252 ERROR > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to > bootstrap configured resource subsystems! > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException: > No FPGA devices detected! > {noformat} > Problem #2 > The plugin assumes that the file name under {{/dev}} can be derived from the > "Physical Dev Name", but this is wrong. For example, it thinks that the > device file is {{/dev/pac_a10_f30}} which is not the case, the actual > file is {{/dev/intel-fpga-port.0}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765949#comment-16765949 ] Wilfred Spiegelenburg commented on YARN-8655: - Hi [~uranus], I am not saying that what we do now is 100% correct. I am only doubting how often this occurs and what the impact on the application and scheduling activities is. Based on the analysis I did I think we need a solution for this case that has far less impact. Do we know any of the following: How badly does it affect the running applications, do we pre-empt double what we should? Does not handling this correctly slow down pre-emption? Is there another impact of not handling the edge case? Pre-emption currently runs almost continually and is gated by the {{take()}}: when there is a pre-emption waiting we handle it. The patch changes this into one pre-emption per second. It effectively throttles down the pre-emption from processing applications based on their arrival to slow scheduled trickle. When I look at how we calculate and decide if the application is marked as minimum share starved the cases should be limited. Even if the application is fair share starved and the queue is min share starved we do not automatically mark the application as min share starved. We thus only have this edge case for a small number of applications. Fixing that edge case by slowing down all pre-emption handling is what I think is not right. > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.002.patch, YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is *fair share starved*, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is *starved by min share*, so this app > is added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9123) Clean up and split testcases in TestNMWebServices for GPU support
[ https://issues.apache.org/jira/browse/YARN-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765911#comment-16765911 ] Adam Antal commented on YARN-9123: -- Hi [~snemeth]. Thanks for the patch! The cleaning looks good, I have some minor comments to that. - The following piece of code is replicated, as you have split the test into 3 parts: {code:java} assertEquals("MediaType of the response is not the expected!", MediaType.APPLICATION_JSON + "; " + JettyUtils.UTF_8, response.getType().toString()); json = response.getEntity(JSONObject.class); Assert.assertEquals(1000, json.get("a")); JSONObject json = response.getEntity(JSONObject.class); assertEquals("Unexpected value in the json response!", 0, json.length()); {code} Consider putting it to a separate function, and calling that in order to avoid minor code duplication. - Though I'm not an expert on this area, it seems strange that there is no logging at all (does not even exist a logging class). Though it would go beyond the scope of this jira, I recommend inserting a log object and a few debug logging: like successfully setting up mocks, respond successfully received. (Only for the tests you marked for refactoring). - The Test testGetNMResourceInfoFailBecauseOfUnknownPlugin is a bit lengthy: 47 character. Though it's completely allowed, we can name it to a bit shorter one for better readability. There aren't any javadocs in the nearby testcases, but you can also move pieces of information about the test to javadoc. > Clean up and split testcases in TestNMWebServices for GPU support > - > > Key: YARN-9123 > URL: https://issues.apache.org/jira/browse/YARN-9123 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-9123.001.patch, YARN-9123.002.patch, > YARN-9123.003.patch, YARN-9123.004.patch > > > The following testcases can be cleaned up a bit: > TestNMWebServices#testGetNMResourceInfo - Can be split up to 3 different cases > TestNMWebServices#testGetYarnGpuResourceInfo -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup
[ https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765895#comment-16765895 ] Zhankun Tang commented on YARN-9294: [~oliverhuh...@gmail.com], Have you tried to do some manual cgroup isolation test without YARN to reproduce it? Like create a directory under /sys/fs/cgroup/devices/hadoop-yarn and echo the values to cgroup device deny file and verify if the process is isolated as expected repeatedly? I used to verify cgroup parameter with cgget and cgdelete. The tools can be installed by {code:java} yum install libcgroup yum install libcgroup-tools cgget -r memory.limit_in_bytes -g memory:hadoop-yarn/container_1542945107795_0003_01_02{code} But I just verified in my Ubuntu VM, the "cgget" cannot show the denied devices although the isolation is working. Maybe the "cgget" also has this bug in REL. >From your description, it seems we should dig with reproducing the flaky GPU >isolation first? Or try different OS kernel version? > Potential race condition in setting GPU cgroups & execute command in the > selected cgroup > > > Key: YARN-9294 > URL: https://issues.apache.org/jira/browse/YARN-9294 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.10.0 >Reporter: Keqiu Hu >Assignee: Keqiu Hu >Priority: Critical > > Environment is latest branch-2 head > OS: RHEL 7.4 > *Observation* > Out of ~10 container allocations with GPU requirement, at least 1 of the > allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I > could still have visibility to all GPUs on the same machine when running > nvidia-smi. > The funny thing is even though I have visibility to all GPUs at the moment of > executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the > process's access to only that single GPU after sometime. > The underlying process trying to access GPU would take the initial > information as source of truth and try to access physical 0 GPU which is not > really available to the process. This results in a > [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error. > Validated the container-executor commands are correct: > {code:java} > PrivilegedOperationExecutor command: > [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, > --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, > 0,1,2,3] > PrivilegedOperationExecutor command: > [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, > application_1549663278916_0249, > /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, > /grid/a/tmp/yarn, /grid/a/tmp/userlogs, > /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer, > khu, application_1549663278916_0249, > container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, > 8040, /grid/a/tmp/yarn] > {code} > So most likely a race condition between these two operations? > cc [~jhung] > Another potential theory is the cgroups creation for the container actually > failed but the error was swallowed silently. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9290) Invalid SchedulingRequest not rejected in Scheduler PlacementConstraintsHandler
[ https://issues.apache.org/jira/browse/YARN-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765890#comment-16765890 ] Prabhu Joseph commented on YARN-9290: - [~cheersyang] Can you review the patch for this Jira when you get time. Placement Processor rejects the Invalid Scheduling Request after configured retry attempts and add it to AllocateResponse RejectedSchedulingRequest set. The Scheduler Processor does not do that instead both AM and RM keeps on trying to allocate & place the invalid request. > Invalid SchedulingRequest not rejected in Scheduler > PlacementConstraintsHandler > > > Key: YARN-9290 > URL: https://issues.apache.org/jira/browse/YARN-9290 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9290-001.patch, YARN-9290-002.patch, > YARN-9290-003.patch > > > SchedulingRequest with Invalid namespace is not rejected in Scheduler > PlacementConstraintsHandler. RM keeps on trying to allocateOnNode with > logging the exception. This is rejected in case of placement-processor > handler. > {code} > 2019-02-08 16:51:27,548 WARN > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator: > Failed to query node cardinality: > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.InvalidAllocationTagsQueryException: > Invalid namespace prefix: notselfi, valid values are: > all,not-self,app-id,app-tag,self > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TargetApplicationsNamespace.fromString(TargetApplicationsNamespace.java:277) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TargetApplicationsNamespace.parse(TargetApplicationsNamespace.java:234) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.AllocationTags.createAllocationTags(AllocationTags.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfySingleConstraintExpression(PlacementConstraintsUtil.java:78) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfySingleConstraint(PlacementConstraintsUtil.java:240) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:321) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyAndConstraint(PlacementConstraintsUtil.java:272) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:324) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:365) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.checkCardinalityAndPending(SingleConstraintAppPlacementAllocator.java:355) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.precheckNode(SingleConstraintAppPlacementAllocator.java:395) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.precheckNode(AppSchedulingInfo.java:779) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.preCheckForNodeCandidateSet(RegularContainerAllocator.java:145) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:837) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:890) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:54) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:977) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1173) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:795) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySch
[jira] [Commented] (YARN-9293) Optimize MockAMLauncher event handling
[ https://issues.apache.org/jira/browse/YARN-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765877#comment-16765877 ] Bibin A Chundatt commented on YARN-9293: trunk test failure is not related to patch attached. TC are passing locally [~sunilg] like to review latest one ?? > Optimize MockAMLauncher event handling > -- > > Key: YARN-9293 > URL: https://issues.apache.org/jira/browse/YARN-9293 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Labels: simulator > Attachments: YARN-9293-branch-3.1.003.patch, YARN-9293.001.patch, > YARN-9293.002.patch, YARN-9293.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9293) Optimize MockAMLauncher event handling
[ https://issues.apache.org/jira/browse/YARN-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765874#comment-16765874 ] Hadoop QA commented on YARN-9293: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 34s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 37s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} hadoop-tools/hadoop-sls: The patch generated 0 new + 53 unchanged - 1 fixed = 53 total (was 54) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 13s{color} | {color:red} hadoop-sls in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 59s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.sls.TestSLSRunner | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:080e9d0 | | JIRA Issue | YARN-9293 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958358/YARN-9293-branch-3.1.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c3053e92f31e 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / f3c1e45 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23382/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23382/testReport/ | | Max. process+thread count | 459 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls | | Console output | https://builds.apache.org/job/PreComm
[jira] [Commented] (YARN-9290) Invalid SchedulingRequest not rejected in Scheduler PlacementConstraintsHandler
[ https://issues.apache.org/jira/browse/YARN-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765871#comment-16765871 ] Hadoop QA commented on YARN-9290: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 44s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 620 unchanged - 3 fixed = 622 total (was 623) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 23s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}152m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9290 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958353/YARN-9290-003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 879f5056cfdb 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d48e61d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/23380/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23380/artifact/out/patch-unit-hadoop-yar
[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin as an example
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Attachment: YARN-9060-trunk.018.patch > [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin > as an example > -- > > Key: YARN-9060 > URL: https://issues.apache.org/jira/browse/YARN-9060 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Major > Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch, > YARN-9060-trunk.003.patch, YARN-9060-trunk.004.patch, > YARN-9060-trunk.005.patch, YARN-9060-trunk.006.patch, > YARN-9060-trunk.007.patch, YARN-9060-trunk.008.patch, > YARN-9060-trunk.009.patch, YARN-9060-trunk.010.patch, > YARN-9060-trunk.011.patch, YARN-9060-trunk.012.patch, > YARN-9060-trunk.013.patch, YARN-9060-trunk.014.patch, > YARN-9060-trunk.015.patch, YARN-9060-trunk.016.patch, > YARN-9060-trunk.017.patch, YARN-9060-trunk.018.patch > > > Due to the cgroups v1 implementation policy in linux kernel, we cannot update > the value of the device cgroups controller unless we have the root permission > ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]). > So we need to support this in container-executor for Java layer to invoke. > This Jira will have three parts: > # native c-e module > # Java layer code to isolate devices for container (docker and non-docker) > # A sample Nvidia GPU plugin -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9293) Optimize MockAMLauncher event handling
[ https://issues.apache.org/jira/browse/YARN-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765844#comment-16765844 ] Hadoop QA commented on YARN-9293: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} hadoop-tools/hadoop-sls: The patch generated 0 new + 53 unchanged - 1 fixed = 53 total (was 54) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 12s{color} | {color:red} hadoop-sls in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.sls.TestSLSStreamAMSynth | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | YARN-9293 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958355/YARN-9293.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5a43a3c87739 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d48e61d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/23381/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/23381/testReport/ | | Max. process+thread count | 470 (vs. ulimit of 1) | | modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/23381/console | | Powered by |
[jira] [Commented] (YARN-9252) Allocation Tag Namespace support in Distributed Shell
[ https://issues.apache.org/jira/browse/YARN-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765799#comment-16765799 ] Weiwei Yang commented on YARN-9252: --- Just cherry picked to branch-3.2, updated fixed version. > Allocation Tag Namespace support in Distributed Shell > - > > Key: YARN-9252 > URL: https://issues.apache.org/jira/browse/YARN-9252 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-shell >Affects Versions: 3.1.1 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0, 3.2.1 > > Attachments: YARN-9252-001.patch, YARN-9252-002.patch, > YARN-9252-003.patch, YARN-9252-004.patch > > > Distributed Shell supports placement constraint but allocation Tag Namespace > is not honored. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9252) Allocation Tag Namespace support in Distributed Shell
[ https://issues.apache.org/jira/browse/YARN-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-9252: -- Fix Version/s: 3.2.1 > Allocation Tag Namespace support in Distributed Shell > - > > Key: YARN-9252 > URL: https://issues.apache.org/jira/browse/YARN-9252 > Project: Hadoop YARN > Issue Type: Sub-task > Components: distributed-shell >Affects Versions: 3.1.1 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0, 3.2.1 > > Attachments: YARN-9252-001.patch, YARN-9252-002.patch, > YARN-9252-003.patch, YARN-9252-004.patch > > > Distributed Shell supports placement constraint but allocation Tag Namespace > is not honored. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9253) Add UT to verify Placement Constraint in Distributed Shell
[ https://issues.apache.org/jira/browse/YARN-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-9253: -- Fix Version/s: 3.1.3 3.2.1 > Add UT to verify Placement Constraint in Distributed Shell > -- > > Key: YARN-9253 > URL: https://issues.apache.org/jira/browse/YARN-9253 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9253-001.patch, YARN-9253-002.patch, > YARN-9253-003.patch, YARN-9253-004.patch, YARN-9253-005.patch > > > Add UT to verify Placement Constraint in Distributed Shell added as part of > YARN-7745 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9253) Add UT to verify Placement Constraint in Distributed Shell
[ https://issues.apache.org/jira/browse/YARN-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765790#comment-16765790 ] Weiwei Yang commented on YARN-9253: --- Just cherry picked to branch-3.2 and branch-3.1 as well. > Add UT to verify Placement Constraint in Distributed Shell > -- > > Key: YARN-9253 > URL: https://issues.apache.org/jira/browse/YARN-9253 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.1 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9253-001.patch, YARN-9253-002.patch, > YARN-9253-003.patch, YARN-9253-004.patch, YARN-9253-005.patch > > > Add UT to verify Placement Constraint in Distributed Shell added as part of > YARN-7745 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9293) Optimize MockAMLauncher event handling
[ https://issues.apache.org/jira/browse/YARN-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765788#comment-16765788 ] Bibin A Chundatt commented on YARN-9293: Attached branch-3.1 and update trunk patch > Optimize MockAMLauncher event handling > -- > > Key: YARN-9293 > URL: https://issues.apache.org/jira/browse/YARN-9293 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Major > Attachments: YARN-9293-branch-3.1.003.patch, YARN-9293.001.patch, > YARN-9293.002.patch, YARN-9293.003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9191) Add cli option in DS to support enforceExecutionType in resource requests.
[ https://issues.apache.org/jira/browse/YARN-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-9191: -- Fix Version/s: 3.1.3 > Add cli option in DS to support enforceExecutionType in resource requests. > -- > > Key: YARN-9191 > URL: https://issues.apache.org/jira/browse/YARN-9191 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: YARN-9191.001.patch, YARN-9191.002.patch, > YARN-9191.003.patch, YARN-9191.004.patch, YARN-9191.005.patch, > YARN-9191.006.patch > > > This JIRA proposes to expose cli option to allow users to additionally > specify the value enforceExecutionType flag (introduced in YARN-5180). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org