[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766900#comment-16766900
 ] 

Hadoop QA commented on YARN-9298:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 1 unchanged - 2 fixed = 1 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 47s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 14s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9298 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958511/YARN-9298.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c15f61ea6635 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7b11b40 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23393/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23393/testReport/ |
| Max. process+thread count | 953 (vs. ulimi

[jira] [Commented] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup

2019-02-12 Thread Keqiu Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766891#comment-16766891
 ] 

Keqiu Hu commented on YARN-9294:


Confirmed it is a race condition in cgroups creation & executing command in the 
cgroups. We plan to go ahead with a safe check between these two privileged 
operations. Note the same issue should apply to 3.1+ as well. cc [~wangda] 
[~tangzhankun]

> Potential race condition in setting GPU cgroups & execute command in the 
> selected cgroup
> 
>
> Key: YARN-9294
> URL: https://issues.apache.org/jira/browse/YARN-9294
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0
>Reporter: Keqiu Hu
>Assignee: Keqiu Hu
>Priority: Critical
>
> Environment is latest branch-2 head
> OS: RHEL 7.4
> *Observation*
> Out of ~10 container allocations with GPU requirement, at least 1 of the 
> allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I 
> could still have visibility to all GPUs on the same machine when running 
> nvidia-smi.
> The funny thing is even though I have visibility to all GPUs at the moment of 
> executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the 
> process's access to only that single GPU after sometime. 
> The underlying process trying to access GPU would take the initial 
> information as source of truth and try to access physical 0 GPU which is not 
> really available to the process. This results in a 
> [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error.
> Validated the container-executor commands are correct:
> {code:java}
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, 
> --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, 
> 0,1,2,3]
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, 
> application_1549663278916_0249, 
> /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, 
> /grid/a/tmp/yarn, /grid/a/tmp/userlogs, 
> /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer,
>  khu, application_1549663278916_0249, 
> container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, 
> 8040, /grid/a/tmp/yarn]
> {code}
> So most likely a race condition between these two operations? 
> cc [~jhung]
> Another potential theory is the cgroups creation for the container actually 
> failed but the error was swallowed silently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766890#comment-16766890
 ] 

Zhaohui Xin commented on YARN-9277:
---

Hi, [~Steven Rand], thanks for your reply. 

If one long-running task is preempted, It's next attempt will run long time 
similarly. If this attempt is also be preempted, this job will be difficult to 
finish.

Also, I think it's not reasonable to limit long-running apps in specific 
queues, which is not generic. Maybe we have a better solution?

 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766880#comment-16766880
 ] 

Zhaohui Xin commented on YARN-9277:
---

Hi, [~wilfreds], you can see issue 
[YARN-8061|https://issues.apache.org/jira/browse/YARN-8061]: An application may 
preempt itself in case of minshare preemption. In my opinion, even if this will 
not happen, we should also add this sanity check. 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766878#comment-16766878
 ] 

Hadoop QA commented on YARN-1655:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 213 unchanged - 0 fixed = 215 total (was 213) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 17s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}138m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.constraint.TestPlacementConstraintsUtil
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-1655 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958509/YARN-1655.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c176f9f1c3c7 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7b11b40 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23392/artifact/out/diff-checkstyle-hadoop-yarn-pr

[jira] [Assigned] (YARN-8061) An application may preempt itself in case of minshare preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-8061:
-

Assignee: Zhaohui Xin

> An application may preempt itself in case of minshare preemption
> 
>
> Key: YARN-8061
> URL: https://issues.apache.org/jira/browse/YARN-8061
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Yufei Gu
>Assignee: Zhaohui Xin
>Priority: Major
>
> Assume a leaf queue A's minshare is 10G memory and fairshare is 12G. It used 
> 4G, so its minshare-staved resources is 6G and will be distributed to all its 
> apps. Assume there are 4 apps a1, a2, a3, a4 inside, who demand 3G, 2G, 1G, 
> and 0.5G. a1 gets 3G minshare-starved resources, a2 gets 2G, a3 get 1G, they 
> are all considered as starved apps except a4 who doesn't get any. 
> An app can preempt another under the same queue due to minshare starvation. 
> For example, a1 can preempt a4 if a4 uses more resources than its fair share, 
> which is 3G(12G/4). If a1 itself used more than 3G memory, it will preempt 
> itself! I will create a unit test later. 
> The solution would check application's fair share while distributing minshare 
> starvation, more details in method 
> {{FSLeafQueue#updateStarvedAppsMinshare()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766880#comment-16766880
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 7:22 AM:


Hi, [~wilfreds], you can see issue YARN-8061: An application may preempt itself 
in case of minshare preemption. In my opinion, even if this will not happen, we 
should also add this as a sanity check. 


was (Author: uranus):
Hi, [~wilfreds], you can see issue 
[YARN-8061|https://issues.apache.org/jira/browse/YARN-8061]: An application may 
preempt itself in case of minshare preemption. In my opinion, even if this will 
not happen, we should also add this sanity check. 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9208) Distributed shell allow LocalResourceVisibility to be specified

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766879#comment-16766879
 ] 

Hadoop QA commented on YARN-9208:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
37s{color} | {color:green} hadoop-yarn-applications-distributedshell in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9208 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958513/YARN-9208-004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux daccc8c33c2b 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 917ac9f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23394/testReport/ |
| Max. process+thread count | 642 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23394/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated

[jira] [Updated] (YARN-9286) Timeline Server sorting based on FinalStatus throws pop-up message

2019-02-12 Thread Nallasivan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nallasivan updated YARN-9286:
-
Summary: Timeline Server sorting based on FinalStatus throws pop-up message 
 (was: Timeline Server(1.5) GUI, sorting based on FinalStatus throws pop-up 
message)

> Timeline Server sorting based on FinalStatus throws pop-up message
> --
>
> Key: YARN-9286
> URL: https://issues.apache.org/jira/browse/YARN-9286
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Nallasivan
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-9286-001.patch
>
>
> In Timeline Server GUI, if we try to sort the details based on FinalStatus, a 
> popup window is getting displayed. And further any operations which involves 
> the refreshing of the page, results in the display of same popup window.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9286) [Timeline Server] Sorting based on FinalStatus throws pop-up message

2019-02-12 Thread Nallasivan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nallasivan updated YARN-9286:
-
Summary: [Timeline Server] Sorting based on FinalStatus throws pop-up 
message  (was: Timeline Server sorting based on FinalStatus throws pop-up 
message)

> [Timeline Server] Sorting based on FinalStatus throws pop-up message
> 
>
> Key: YARN-9286
> URL: https://issues.apache.org/jira/browse/YARN-9286
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Nallasivan
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-9286-001.patch
>
>
> In Timeline Server GUI, if we try to sort the details based on FinalStatus, a 
> popup window is getting displayed. And further any operations which involves 
> the refreshing of the page, results in the display of same popup window.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766813#comment-16766813
 ] 

Wilfred Spiegelenburg commented on YARN-8967:
-

After talking off line with a number of people the request was to divide this 
change into two parts due to its size:
* _part 1_ for the new rules and changes to the existing PlacementRule code
* _part 2_ for the FS changes and integration

It is the only way that the change can be split and make them compile 
separately. A new jira YARN-9298 is open for _part 1_ and we'll keep this jira 
for _part 2_. Removing patch available until that one is checked in.

It will also allow work to start on enhancing the rules with filters etc which 
have existing open jiras.

> Change FairScheduler to use PlacementRule interface
> ---
>
> Key: YARN-8967
> URL: https://issues.apache.org/jira/browse/YARN-8967
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-8967.001.patch, YARN-8967.002.patch, 
> YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch
>
>
> The PlacementRule interface was introduced to be used by all schedulers as 
> per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not 
> and is using its own rule definition.
> YARN-8948 cleans up the implementation and removes the CS references which 
> should allow this change to go through.
> This would be the first step in using one placement rule engine for both 
> schedulers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766811#comment-16766811
 ] 

Wilfred Spiegelenburg commented on YARN-9277:
-

I agree with [~Steven Rand] sorting could be good but setting a hard no go 
could cause issues.

Can you also explain how we can pre-empt a container that is owned by the 
application itself? 
I thought that we would only allow containers to be pre-empted if the 
application is over its fair share and even then only if pre-empting the 
container would not drop the application below its fair share. The 
{{FSPreemptionThread.identifyContainersToPreemptOnNode()}} calls 
{{app.canContainerBePreempted()}} which contains that check and the container 
is not added. Since the app we are pre-empting for is under its fair share any 
container of the app itself should be filtered out by that. Am I reading this 
all wrong or have you found cases that we did pre-empt a container for its own 
app and it is not working as expected?

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9208) Distributed shell allow LocalResourceVisibility to be specified

2019-02-12 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9208:

Attachment: YARN-9208-004.patch

> Distributed shell allow LocalResourceVisibility to be specified
> ---
>
> Key: YARN-9208
> URL: https://issues.apache.org/jira/browse/YARN-9208
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9208-001.patch, YARN-9208-002.patch, 
> YARN-9208-003.patch, YARN-9208-004.patch
>
>
> YARN-9008 add feature to add list of files to be localized.
> Would be great to have Visibility type too. Allows testing of PRIVATE and 
> PUBLIC type too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-9298:

Attachment: YARN-9298.001.patch

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766797#comment-16766797
 ] 

Hadoop QA commented on YARN-999:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
25s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 8 new + 248 unchanged - 6 fixed = 256 total (was 254) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
52s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m  1s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
47s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}176m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-999 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958494/YARN-291.000.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8f91358e17e8 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3dc2523 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/Pre

[jira] [Updated] (YARN-9296) [Timeline Server] FinalStatus is displayed wrong for killed and failed applications

2019-02-12 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9296:
---
Summary: [Timeline Server] FinalStatus is displayed wrong for killed and 
failed applications  (was: Timeline Server FinalStatus is displayed wrong for 
killed and failed applications)

> [Timeline Server] FinalStatus is displayed wrong for killed and failed 
> applications
> ---
>
> Key: YARN-9296
> URL: https://issues.apache.org/jira/browse/YARN-9296
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Nallasivan
>Priority: Minor
>
> Timline Server(1.5), FinalStatus of the applications which are killed and 
> failed, is displayed as UNDEFINED in both GUI, REST API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9296) Timeline Server FinalStatus is displayed wrong for killed and failed applications

2019-02-12 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-9296:
---
Summary: Timeline Server FinalStatus is displayed wrong for killed and 
failed applications  (was: In Timline Server(1.5), FinalStatus is displayed 
wrong for killed and failed applications)

> Timeline Server FinalStatus is displayed wrong for killed and failed 
> applications
> -
>
> Key: YARN-9296
> URL: https://issues.apache.org/jira/browse/YARN-9296
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Nallasivan
>Priority: Minor
>
> Timline Server(1.5), FinalStatus of the applications which are killed and 
> failed, is displayed as UNDEFINED in both GUI, REST API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9296) In Timline Server(1.5), FinalStatus is displayed wrong for killed and failed applications

2019-02-12 Thread Nallasivan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nallasivan updated YARN-9296:
-
Description: Timline Server(1.5), FinalStatus of the applications which are 
killed and failed, is displayed as UNDEFINED in both GUI, REST API  (was: In 
Timline Server(1.5), FinalStatus of the applications which are killed and 
failed, is displayed as UNDEFINED in both GUI, REST API)

> In Timline Server(1.5), FinalStatus is displayed wrong for killed and failed 
> applications
> -
>
> Key: YARN-9296
> URL: https://issues.apache.org/jira/browse/YARN-9296
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Nallasivan
>Priority: Minor
>
> Timline Server(1.5), FinalStatus of the applications which are killed and 
> failed, is displayed as UNDEFINED in both GUI, REST API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Steven Rand (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766783#comment-16766783
 ] 

Steven Rand commented on YARN-9277:
---

{code}
+// We should not preempt container which has been running for a long time.
+if ((System.currentTimeMillis() - container.getCreationTime()) >=
+getQueue().getFSContext().getPreemptionConfig()
+.getToBePreemptedContainerRuntimeThreshold()) {
+  logPreemptContainerPreCheckInfo(
+  "this container already run a long time!");
+  return false;
+}
+
{code}

I disagree with this because it allows for situations in which starved 
applications can't preempt applications that are over their fair shares. If 
application A is starved and application B is over its fair share, but happens 
to have all its containers running for more than the threshold, then 
application A is unable to preempt and will remain starved.

It might be reasonable to sort preemptable containers by runtime and preempt 
those that have started most recently. However, I worry that this unfairly 
biases the scheduler against applications with shorter-lived tasks.

If code can't be optimized, and really does require very long-running tasks, 
then these jobs can be run in a queue from which preemption isn't allowed via 
the {{allowPreemptionFrom}} property.

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-9298:
---

 Summary: Implement FS placement rules using PlacementRule interface
 Key: YARN-9298
 URL: https://issues.apache.org/jira/browse/YARN-9298
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


Implement existing placement rules of the FS using the PlacementRule interface.

Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766771#comment-16766771
 ] 

Wilfred Spiegelenburg commented on YARN-1655:
-

Updated test to make it more robust. locally ran all new tests 250 times have 
not seen a failure.

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-1655:

Attachment: YARN-1655.003.patch

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766760#comment-16766760
 ] 

Zhankun Tang edited comment on YARN-9294 at 2/13/19 4:16 AM:
-

[~oliverhuh...@gmail.com] , Yeah. Agree with the plan. And to find the stable 
reproducing steps seems a good start to me. We can write a script to create sub 
device cgroups and use "container-executor" to set parameter and then attach 
the running processes to verify if anyone sees the denied devices.


was (Author: tangzhankun):
[~oliverhuh...@gmail.com] , Yeah. Agree with the plan. And to find the stable 
reproducing steps seems a good start to me. We can write a script to create sub 
device cgroups and use "container-executor" to set parameter and then attach 
the processes repeatedly to verify if someone sees denied devices.

> Potential race condition in setting GPU cgroups & execute command in the 
> selected cgroup
> 
>
> Key: YARN-9294
> URL: https://issues.apache.org/jira/browse/YARN-9294
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0
>Reporter: Keqiu Hu
>Assignee: Keqiu Hu
>Priority: Critical
>
> Environment is latest branch-2 head
> OS: RHEL 7.4
> *Observation*
> Out of ~10 container allocations with GPU requirement, at least 1 of the 
> allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I 
> could still have visibility to all GPUs on the same machine when running 
> nvidia-smi.
> The funny thing is even though I have visibility to all GPUs at the moment of 
> executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the 
> process's access to only that single GPU after sometime. 
> The underlying process trying to access GPU would take the initial 
> information as source of truth and try to access physical 0 GPU which is not 
> really available to the process. This results in a 
> [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error.
> Validated the container-executor commands are correct:
> {code:java}
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, 
> --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, 
> 0,1,2,3]
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, 
> application_1549663278916_0249, 
> /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, 
> /grid/a/tmp/yarn, /grid/a/tmp/userlogs, 
> /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer,
>  khu, application_1549663278916_0249, 
> container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, 
> 8040, /grid/a/tmp/yarn]
> {code}
> So most likely a race condition between these two operations? 
> cc [~jhung]
> Another potential theory is the cgroups creation for the container actually 
> failed but the error was swallowed silently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766760#comment-16766760
 ] 

Zhankun Tang commented on YARN-9294:


[~oliverhuh...@gmail.com] , Yeah. Agree with the plan. And to find the stable 
reproducing steps seems a good start to me. We can write a script to create sub 
device cgroups and use "container-executor" to set parameter and then attach 
the processes repeatedly to verify if someone sees denied devices.

> Potential race condition in setting GPU cgroups & execute command in the 
> selected cgroup
> 
>
> Key: YARN-9294
> URL: https://issues.apache.org/jira/browse/YARN-9294
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0
>Reporter: Keqiu Hu
>Assignee: Keqiu Hu
>Priority: Critical
>
> Environment is latest branch-2 head
> OS: RHEL 7.4
> *Observation*
> Out of ~10 container allocations with GPU requirement, at least 1 of the 
> allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I 
> could still have visibility to all GPUs on the same machine when running 
> nvidia-smi.
> The funny thing is even though I have visibility to all GPUs at the moment of 
> executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the 
> process's access to only that single GPU after sometime. 
> The underlying process trying to access GPU would take the initial 
> information as source of truth and try to access physical 0 GPU which is not 
> really available to the process. This results in a 
> [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error.
> Validated the container-executor commands are correct:
> {code:java}
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, 
> --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, 
> 0,1,2,3]
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, 
> application_1549663278916_0249, 
> /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, 
> /grid/a/tmp/yarn, /grid/a/tmp/userlogs, 
> /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer,
>  khu, application_1549663278916_0249, 
> container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, 
> 8040, /grid/a/tmp/yarn]
> {code}
> So most likely a race condition between these two operations? 
> cc [~jhung]
> Another potential theory is the cgroups creation for the container actually 
> failed but the error was swallowed silently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9118) Handle issues with parsing user defined GPU devices in GpuDiscoverer

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766750#comment-16766750
 ] 

Zhankun Tang commented on YARN-9118:


[~snemeth] , thanks for the patch! Please fix the check style issues.

[~sunilg] , the latest patch looks good to me.

> Handle issues with parsing user defined GPU devices in GpuDiscoverer
> 
>
> Key: YARN-9118
> URL: https://issues.apache.org/jira/browse/YARN-9118
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9118.001.patch, YARN-9118.002.patch, 
> YARN-9118.003.patch, YARN-9118.004.patch, YARN-9118.005.patch, 
> YARN-9118.006.patch, YARN-9118.007.patch
>
>
> getGpusUsableByYarn has the following issues: 
> - Duplicate GPU device definitions are not denied: This seems to be the 
> biggest issue as it could increase the number of devices on the node if the 
> device ID is defined 2 or more times.
> - An empty-string is accepted, it works like the user would not want to use 
> auto-discovery and haven't defined any GPU devices: This will result in an 
> empty device list, but the empty-string check is never explicitly there in 
> the code, so this behavior just coincidental.
> - Number validation does not happen on GPU device IDs (separated by commas)
> Many testcases are added as the coverage was already very low.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766737#comment-16766737
 ] 

Zhankun Tang commented on YARN-9265:


[~pbacsko] , Thanks for the patch. It looks good to me.

> FPGA plugin fails to recognize Intel Processing Accelerator Card
> 
>
> Key: YARN-9265
> URL: https://issues.apache.org/jira/browse/YARN-9265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-9265-001.patch, YARN-9265-002.patch
>
>
> The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card).
> There are two major issues.
> Problem #1
> The output of aocl diagnose:
> {noformat}
> 
> Device Name:
> acl0
>  
> Package Pat:
> /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp
>  
> Vendor: Intel Corp
>  
> Physical Dev Name   StatusInformation
>  
> pac_a10_f20 PassedPAC Arria 10 Platform (pac_a10_f20)
>   PCIe 08:00.0
>   FPGA temperature = 79 degrees C.
>  
> DIAGNOSTIC_PASSED
> 
>  
> Call "aocl diagnose " to run diagnose for specified devices
> Call "aocl diagnose all" to run diagnose for all devices
> {noformat}
> The plugin fails to recognize this and fails with the following message:
> {noformat}
> 2019-01-25 06:46:02,834 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin:
>  Using FPGA vendor plugin: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin
> 2019-01-25 06:46:02,943 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer:
>  Trying to diagnose FPGA information ...
> 2019-01-25 06:46:03,085 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule:
>  Using traffic control bandwidth handler
> 2019-01-25 06:46:03,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn
> 2019-01-25 06:46:03,139 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl:
>  FPGA Plugin bootstrap success.
> 2019-01-25 06:46:03,247 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)bus:slot.func\s=\s.*, pattern
> 2019-01-25 06:46:03,248 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern
> 2019-01-25 06:46:03,251 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Failed to get major-minor number from reading /dev/pac_a10_f30
> 2019-01-25 06:46:03,252 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to 
> bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  No FPGA devices detected!
> {noformat}
> Problem #2
> The plugin assumes that the file name under {{/dev}} can be derived from the 
> "Physical Dev Name", but this is wrong. For example, it thinks that the 
> device file is {{/dev/pac_a10_f30}} which is not the case, the actual 
> file is {{/dev/intel-fpga-port.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:17 AM:


Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority in FairScheduler currently. So 
this restriction will be valid only after YARN-2098, I will remove this 
restriction soon afterwards.
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
{quote} We should not preempt container which has been running for a long time.
{quote}
I think this is a import restriction. *Because it's very costly to kill one 
task which has been running with dozens of hours.* 


was (Author: uranus):
Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority in FairScheduler currently. So 
this restriction is invalid in community version. This will be valid after 
YARN-2098.
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
{quote} We should not preempt container which has been running for a long time.
{quote}
I think this is a import restriction. *Because it's very costly to kill one 
task which has been running with dozens of hours.* 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:14 AM:


Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority in FairScheduler currently. So 
this restriction is invalid in community version. This will be valid after 
YARN-2098.
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
{quote} We should not preempt container which has been running for a long time.
{quote}
I think this is a import restriction. *Because it's very costly to kill one 
task which has been running with dozens of hours.* 


was (Author: uranus):
Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority in FairScheduler currently. So 
this restriction is invalid in community version. This will be valid after 
[YARN-2098|https://issues.apache.org/jira/browse/YARN-2098].
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 3:09 AM:


Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority in FairScheduler currently. So 
this restriction is invalid in community version. This will be valid after 
[YARN-2098|https://issues.apache.org/jira/browse/YARN-2098].
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 


was (Author: uranus):
Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority currently. So this restriction 
is invalid in community version. BTW, we honored app's priority from 
_ApplicationSubmissionContext_ in our cluster. I think the community should 
also change like this, but this is another problem.
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766696#comment-16766696
 ] 

Zhankun Tang edited comment on YARN-8927 at 2/13/19 3:03 AM:
-

[~ebadger]

Just checked, if an image name is "repoA/userA/imageA", configure either 
"repoA" or "repoA/userA" in "docker.trusted.registries" can both works. Also 
configure "repoA/userA/prefixA" or "repoA/userA" or "repoA" can all allow the 
image name "repoA/userA/refixA/imageA". 

So it seems it doesn't need explicit logic to allow such named images?


was (Author: tangzhankun):
[~ebadger]

Just checked, if an image name is "repoA/userA/imageA", configure either 
"repoA" or "repoA/userA" in "docker.trusted.registries" can both works. So it 
seems it doesn't need explicit logic to allow such named images?

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766696#comment-16766696
 ] 

Zhankun Tang edited comment on YARN-8927 at 2/13/19 2:52 AM:
-

[~ebadger]

Just checked, if an image name is "repoA/userA/imageA", configure either 
"repoA" or "repoA/userA" in "docker.trusted.registries" can both works. So it 
seems it doesn't need explicit logic to allow such named images?


was (Author: tangzhankun):
[~ebadger]

Just checked, if an image name is "repoA/userA/imageA", configure "repoA" and 
"repoA/userA" in "docker.trusted.registries" can both works. So it seems it 
doesn't need explicit logic to allow such named images?

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 2:58 AM:


Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority currently. So this restriction 
is invalid in community version. BTW, we honored app's priority from 
_ApplicationSubmissionContext_ in our cluster. I think the community should 
also change like this, but this is another problem.
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 


was (Author: uranus):
Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority currently. So this restriction 
is invalid in community version. BTW, we honored app's priority from 
_ApplicationSubmissionContext_ in our cluster. I think the community should 
also change like this, but this is another problem.

 

 
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694
 ] 

Zhaohui Xin edited comment on YARN-9277 at 2/13/19 2:57 AM:


Hi, [~yufeigu]. Thanks for your reply. 
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}
You are right. Yarn jobs have the same priority currently. So this restriction 
is invalid in community version. BTW, we honored app's priority from 
_ApplicationSubmissionContext_ in our cluster. I think the community should 
also change like this, but this is another problem.

 

 
{code:java}
public Priority getPriority() {
  // Right now per-app priorities are not passed to scheduler,
  // so everyone has the same priority.
  return appPriority;
 }{code}
 


was (Author: uranus):
Hi, [~yufeigu]. Thanks for your reply.
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766696#comment-16766696
 ] 

Zhankun Tang edited comment on YARN-8927 at 2/13/19 2:52 AM:
-

[~ebadger]

Just checked, if an image name is "repoA/userA/imageA", configure "repoA" and 
"repoA/userA" in "docker.trusted.registries" can both works. So it seems it 
doesn't need explicit logic to allow such named images?


was (Author: tangzhankun):
Just checked, if an image name is "repoA/userA/imageA", configure "repoA" and 
"repoA/userA" in "docker.trusted.registries" can both works. So it seems it 
doesn't need explicit logic to allow such named images?

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766696#comment-16766696
 ] 

Zhankun Tang commented on YARN-8927:


Just checked, if an image name is "repoA/userA/imageA", configure "repoA" and 
"repoA/userA" in "docker.trusted.registries" can both works. So it seems it 
doesn't need explicit logic to allow such named images?

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766694#comment-16766694
 ] 

Zhaohui Xin commented on YARN-9277:
---

Hi, [~yufeigu]. Thanks for your reply.
{quote}Correct me if I am wrong, there are no priority between Yarn jobs. 
Priority has been applied to tasks inside one job, which was there before the 
FS preemption overhaul. We need only priorities between mappers and reducers or 
other customized priorities since AM containers are always the first priority 
and have been taken care.
{quote}

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-12 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766687#comment-16766687
 ] 

Íñigo Goiri commented on YARN-999:
--

Just to make sure we are in the same page, I added [^YARN-291.000.patch] with a 
WIP.
This basically shows a unit test that makes sure we get the expected behavior.
In the scheduler side, I did a very hacky approach for preemption just for the 
unit test.
Trying to figure out the best way to do the preemption following events or so.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
> Attachments: YARN-291.000.patch
>
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-12 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766686#comment-16766686
 ] 

Junping Du commented on YARN-999:
-

bq. I am not sure how exactly the reduction of node resources is implemented, 
but for the opportunistic containers, you can kill stuff locally at the NMs. So 
if you need to free up resources due to resource reduction, you can go over the 
opportunistic containers running and kill the long-running ones.
So far, the reduction of node resources won't kill any containers but wait 
until container get finished - quite old behavior as no long running service 
support when feature get implemented for the first time. 

I think we need a generic policy here that can pick up containers to balloon 
out resources according to some cost - opportunistic/guaranteed could be one 
dimension but could count others - container size, running time, etc.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
> Attachments: YARN-291.000.patch
>
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-12 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-999:
-
Attachment: YARN-291.000.patch

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
> Attachments: YARN-291.000.patch
>
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636
 ] 

Zhankun Tang edited comment on YARN-8927 at 2/13/19 1:39 AM:
-

[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she 
should configure "repoA/userA" in the "docker.trusted.registries"? I will try 
if this works and get back to you.

And one thing worthing noting is that if YARN allows an image name, then Docker 
will check if it's local and prefer to run it before pulling from a hub. YARN's 
checking logic here seems duplicated work because if Docker can pull it and 
run. We can hardly say this "repoA/userA/imageA" is a real local image. 


was (Author: tangzhankun):
[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she 
should configure "repoA/userA" in the "docker.trusted.registries"? I will try 
if this works and get back to you.

And one thing worthing noting is that if YARN allow an image name, then Docker 
will check if it's local and prefer to run it.

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636
 ] 

Zhankun Tang edited comment on YARN-8927 at 2/13/19 1:31 AM:
-

[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she 
should configure "repoA/userA" in the "docker.trusted.registries"? I will try 
if this works and get back to you.


was (Author: tangzhankun):
[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she 
should configure "userA" in the "docker.trusted.registries"?

And after "userA" is configured, Docker will check if that image is local or 
not. If local, the container will run. If not, the container will fail if not 
in docker hub.

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636
 ] 

Zhankun Tang edited comment on YARN-8927 at 2/13/19 1:28 AM:
-

[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she 
should configure "userA" in the "docker.trusted.registries"?

And after "userA" is configured, Docker will check if that image is local or 
not. If local, the container will run. If not, the container will fail if not 
in docker hub.


was (Author: tangzhankun):
[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she 
should configure "userA" in the "docker.trusted.registries"?

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636
 ] 

Zhankun Tang edited comment on YARN-8927 at 2/13/19 1:35 AM:
-

[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she 
should configure "repoA/userA" in the "docker.trusted.registries"? I will try 
if this works and get back to you.

And one thing worthing noting is that if YARN allow an image name, then Docker 
will check if it's local and prefer to run it.


was (Author: tangzhankun):
[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she 
should configure "repoA/userA" in the "docker.trusted.registries"? I will try 
if this works and get back to you.

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636
 ] 

Zhankun Tang edited comment on YARN-8927 at 2/13/19 1:25 AM:
-

[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she 
should configure "userA" in the "docker.trusted.registries"?


was (Author: tangzhankun):
[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she 
should configure "user1" in the "docker.trusted.registries"?

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766636#comment-16766636
 ] 

Zhankun Tang commented on YARN-8927:


[~eyang], [~ebadger] Thanks for the review!

If a local image name contains "/" in it, it may not be considered as 
"top-level" image.

It seems if a user wants lcoal image "userA/imageA" to be allowed, he/she 
should configure "user1" in the "docker.trusted.registries"?

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9297) Renaming RM could cause application to crash

2019-02-12 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu resolved YARN-9297.

Resolution: Duplicate

> Renaming RM could cause application to crash
> 
>
> Key: YARN-9297
> URL: https://issues.apache.org/jira/browse/YARN-9297
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Aihua Xu
>Priority: Major
>
> In this line, we are throwing UnknownHostException when any RM host can't 
> resolve to ip address. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448
> There are some cases that one RM needs to rename or map to different ip 
> address, then it will crash the application although other RMs are running 
> fine. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9297) Renaming RM could cause application to crash

2019-02-12 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766618#comment-16766618
 ] 

Aihua Xu commented on YARN-9297:


Yes. You are right. I will resolve as dup. Thanks [~jojochuang]

> Renaming RM could cause application to crash
> 
>
> Key: YARN-9297
> URL: https://issues.apache.org/jira/browse/YARN-9297
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Aihua Xu
>Priority: Major
>
> In this line, we are throwing UnknownHostException when any RM host can't 
> resolve to ip address. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448
> There are some cases that one RM needs to rename or map to different ip 
> address, then it will crash the application although other RMs are running 
> fine. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9273) Flexing a component of YARN service does not work as documented when using relative number

2019-02-12 Thread Masahiro Tanaka (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766603#comment-16766603
 ] 

Masahiro Tanaka commented on YARN-9273:
---

Hi [~eyang], could you review this?

> Flexing a component of YARN service does not work as documented when using 
> relative number
> --
>
> Key: YARN-9273
> URL: https://issues.apache.org/jira/browse/YARN-9273
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Masahiro Tanaka
>Assignee: Masahiro Tanaka
>Priority: Minor
> Attachments: YARN-9273.001.patch, YARN-9273.002.patch, 
> YARN-9273.003.patch, YARN-9273.004.patch
>
>
> [The 
> documents|https://hadoop.apache.org/docs/r3.2.0/hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html]
>  says,
>  "Relative changes are also supported for the ${NUMBER_OF_CONTAINERS} in the 
> flex command, such as +2 or -2." when you want to flex a component of a YARN 
> service.
> I expected that {{yarn app -flex sleeper-service -component sleeper +1}} 
> increments the number of container, but actually it sets the number of 
> container to just one.
> I guess ApiServiceClient#actionFlex treats flexing when executing the {{yarn 
> app -flex}}, and it just uses {{Long.parseLong}} to convert the argument like 
> {{+1}}, which doesn't care relative numbers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9297) Renaming RM could cause application to crash

2019-02-12 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766578#comment-16766578
 ] 

Wei-Chiu Chuang commented on YARN-9297:
---

Similar to HADOOP-15864?

> Renaming RM could cause application to crash
> 
>
> Key: YARN-9297
> URL: https://issues.apache.org/jira/browse/YARN-9297
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.6.0
>Reporter: Aihua Xu
>Priority: Major
>
> In this line, we are throwing UnknownHostException when any RM host can't 
> resolve to ip address. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448
> There are some cases that one RM needs to rename or map to different ip 
> address, then it will crash the application although other RMs are running 
> fine. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9297) Renaming RM could cause application to crash

2019-02-12 Thread Aihua Xu (JIRA)
Aihua Xu created YARN-9297:
--

 Summary: Renaming RM could cause application to crash
 Key: YARN-9297
 URL: https://issues.apache.org/jira/browse/YARN-9297
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: security
Affects Versions: 2.6.0
Reporter: Aihua Xu


In this line, we are throwing UnknownHostException when any RM host can't 
resolve to ip address. 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L448

There are some cases that one RM needs to rename or map to different ip 
address, then it will crash the application although other RMs are running 
fine. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766542#comment-16766542
 ] 

Hadoop QA commented on YARN-7129:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 
10s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 36s{color} | {color:orange} root: The patch generated 9 new + 4 unchanged - 
0 fixed = 13 total (was 4) {color} |
| {color:green}+1{color} | {color:green} hadolint {color} | {color:green}  0m  
1s{color} | {color:green} There were no new hadolint issues. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 14m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 1s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:orange}-0{color} | {color:orange} shelldocs {color} | {color:orange}  
0m 15s{color} | {color:orange} The patch generated 160 new + 104 unchanged - 0 
fixed = 264 total (was 104) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
14s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-docker
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
41s{color} | {color:green} the patch passed {color} |
|| 

[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-12 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766493#comment-16766493
 ] 

Eric Yang commented on YARN-8927:
-

{quote}If we see library/ in container-executor.cfg then we trust all local 
images.{quote}

I am not sure how to identify if a image is local, if image contains '/' 
character.  I think patch 002 will break [~ebadger]'s environment since local 
image names have '/' character in it.  [~tangzhankun] any idea on how to fix 
this?

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9184) Docker run doesn't pull down latest image if the image exists locally

2019-02-12 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766484#comment-16766484
 ] 

Hudson commented on YARN-9184:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15941 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/15941/])
YARN-9184. Add a system flag to allow update to latest docker images.
(eyang: rev 3dc252326693170ac1b31bf2914bae72ca73d31a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java


> Docker run doesn't pull down latest image if the image exists locally 
> --
>
> Key: YARN-9184
> URL: https://issues.apache.org/jira/browse/YARN-9184
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9184.001.patch, YARN-9184.002.patch, 
> YARN-9184.003.patch, YARN-9184.004.patch, YARN-9184.005.patch
>
>
> See [docker run doesn't pull down latest image if the image exists 
> locally|https://github.com/moby/moby/issues/13331].
> So, I think we should pull image before run to make image always latest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8955) Add a flag to use local docker image instead of getting latest from registry

2019-02-12 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang resolved YARN-8955.
-
   Resolution: Duplicate
Fix Version/s: 3.3.0

This issue is duplicate of YARN-9184 and YARN-9292 combined.  Close as 
duplicates.

> Add a flag to use local docker image instead of getting latest from registry
> 
>
> Key: YARN-8955
> URL: https://issues.apache.org/jira/browse/YARN-8955
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Fix For: 3.3.0
>
>
> Some companies have security policy to use local docker images instead of 
> getting latest images from internet.  When docker image is pulled in 
> localization phase, there are two possible out comes.  The image is getting 
> the latest from trusted registries, or the image is a static local copy.  
> This task is to add a configuration flag to give priority to local image over 
> trusted registry image. 
>  If a image already exist locally, node manager does not trigger docker pull 
> to get the latest image from trusted registries. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9184) Docker run doesn't pull down latest image if the image exists locally

2019-02-12 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766470#comment-16766470
 ] 

Eric Yang commented on YARN-9184:
-

+1 Patch 5 looks good to me.  Committing shortly.

> Docker run doesn't pull down latest image if the image exists locally 
> --
>
> Key: YARN-9184
> URL: https://issues.apache.org/jira/browse/YARN-9184
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9184.001.patch, YARN-9184.002.patch, 
> YARN-9184.003.patch, YARN-9184.004.patch, YARN-9184.005.patch
>
>
> See [docker run doesn't pull down latest image if the image exists 
> locally|https://github.com/moby/moby/issues/13331].
> So, I think we should pull image before run to make image always latest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9208) Distributed shell allow LocalResourceVisibility to be specified

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766481#comment-16766481
 ] 

Hadoop QA commented on YARN-9208:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 46s{color} 
| {color:red} hadoop-yarn-applications-distributedshell in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShell |
|   | hadoop.yarn.applications.distributedshell.TestDSWithMultipleNodeManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9208 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958444/YARN-9208-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3696b268c2e6 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7806403 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23390/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-distributedshell.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23390/testReport/ |
| Max. process+thread count | 583 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/

[jira] [Updated] (YARN-9208) Distributed shell allow LocalResourceVisibility to be specified

2019-02-12 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9208:

Attachment: YARN-9208-003.patch

> Distributed shell allow LocalResourceVisibility to be specified
> ---
>
> Key: YARN-9208
> URL: https://issues.apache.org/jira/browse/YARN-9208
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9208-001.patch, YARN-9208-002.patch, 
> YARN-9208-003.patch
>
>
> YARN-9008 add feature to add list of files to be localized.
> Would be great to have Visibility type too. Allows testing of PRIVATE and 
> PUBLIC type too



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-12 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766325#comment-16766325
 ] 

Konstantinos Karanasos commented on YARN-999:
-

Give a look at YARN-7934 – we had refactored some stuff in preemption for the 
federation code (for the glabal queues in particular). The umbrella Jira is not 
finished, but I think this Jira will point you to some useful classes.

I am not sure how exactly the reduction of node resources is implemented, but 
for the opportunistic containers, you can kill stuff locally at the NMs. So if 
you need to free up resources due to resource reduction, you can go over the 
opportunistic containers running and kill the long-running ones).

As far as I remember, the regular preemption code in the RM will not touch 
opportunistic containers.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-12 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766325#comment-16766325
 ] 

Konstantinos Karanasos edited comment on YARN-999 at 2/12/19 6:53 PM:
--

Give a look at YARN-7934 – we had refactored some stuff in preemption for the 
federation code (for the glabal queues in particular). The umbrella Jira is not 
finished, but I think this Jira will point you to some useful classes.

I am not sure how exactly the reduction of node resources is implemented, but 
for the opportunistic containers, you can kill stuff locally at the NMs. So if 
you need to free up resources due to resource reduction, you can go over the 
opportunistic containers running and kill the long-running ones.

As far as I remember, the regular preemption code in the RM will not touch 
opportunistic containers.


was (Author: kkaranasos):
Give a look at YARN-7934 – we had refactored some stuff in preemption for the 
federation code (for the glabal queues in particular). The umbrella Jira is not 
finished, but I think this Jira will point you to some useful classes.

I am not sure how exactly the reduction of node resources is implemented, but 
for the opportunistic containers, you can kill stuff locally at the NMs. So if 
you need to free up resources due to resource reduction, you can go over the 
opportunistic containers running and kill the long-running ones).

As far as I remember, the regular preemption code in the RM will not touch 
opportunistic containers.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-12 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766317#comment-16766317
 ] 

Íñigo Goiri commented on YARN-999:
--

{quote}
If it is an opportunistic container, it will already be killed fast, so I think 
you don't need a distinction between guaranteed/opportunistic (you will do 
preemption only in the guaranteed after the timeout).
{quote}

Right now there is no plumbing at all so I need to build the whole preemption 
from scratch.
Is there a function in the RM which I can call to adjust containers to the 
resources?
Otherwise, I will need to go over the containers and selecting which ones to 
kill; in this case I need to do the distinction between guaranteed and 
opportunistic.
I don't think the NM is doing anything here.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-12 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766305#comment-16766305
 ] 

Konstantinos Karanasos commented on YARN-999:
-

Hi [~elgoiri], if it is an opportunistic container, it will already be killed 
fast, so I think you don't need a distinction between guaranteed/opportunistic 
(you will do preemption only in the guaranteed after the timeout).

I think [~asuresh] might have worked on something related to preemption 
recently, but not sure.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766287#comment-16766287
 ] 

Yufei Gu commented on YARN-9277:


Hi [~uranus], some general comments, I haven't looked at the code yet.
bq. We should not preempt self
+1
bq. We should not preempt high priority job. 
Correct me if I am wrong, there are no priority between Yarn jobs. Priority has 
been applied to tasks inside one job, which was there before the FS preemption 
overhaul. We need only priorities between mappers and reducers or other 
customized priorities since AM containers are always the first priority and 
have been taken care.
bq. We should not preempt container which has been running for a long time.
Makes sense if all other conditions are exactly the same.

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766277#comment-16766277
 ] 

Hadoop QA commented on YARN-9277:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 6 new + 48 unchanged - 0 fixed = 54 total (was 48) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m  8s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
29s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}156m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9277 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958402/YARN-9277.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6c682c5b8d16 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 20b92cd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23387/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apac

[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-12 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766248#comment-16766248
 ] 

Íñigo Goiri commented on YARN-999:
--

Thanks [~djp] for referring to YARN-2489.
I'll start working on a generic one and then we decide where to post it.

I think the idea would be for the RM to track the moment it got the change in 
resources and once the timeout passes send {{ContainerPreemptEvent}}.
I see this is added in YARN-569 and used in a few places.

[~asuresh], [~kkaranasos], I remember you guys had work recently some 
preemption.
Do you guys know what would be a good JIRA to use as a reference for this?
Hopefully something that uses distinction of OPPORTUNISTIC containers and 
others.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Priority: Major
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7129) Application Catalog for YARN applications

2019-02-12 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7129:

Attachment: YARN-7129.024.patch

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2019-02-12 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766225#comment-16766225
 ] 

Eric Yang commented on YARN-7129:
-

Patch 24 fixes hadolint issue.

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9268) Various fixes are needed in FpgaDevice

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766218#comment-16766218
 ] 

Hadoop QA commented on YARN-9268:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 134 unchanged - 13 fixed = 134 total (was 147) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
12s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9268 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958404/YARN-9268-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2c7e20e10578 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 20b92cd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23388/testReport/ |
| Max. process+thread count | 446 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23388/console |
| Powered by

[jira] [Commented] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup

2019-02-12 Thread Keqiu Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766200#comment-16766200
 ] 

Keqiu Hu commented on YARN-9294:


Yes, cgget is the old API in RHEL 6 (libcgroup tookit) to get the allocation. 
It works for memory & CPU, but not for devices. I don't think it is a bug but 
it seems something not supported to pull device information. I did that 
manually in my desktop and it works.

My current suspicion is this _echo values to cgroup process_ might be flaky 
sometimes (either file io error, cgroup glitches, race conditions etc). Plan is 
to have a check to make sure the cgroup manipulation works before moving on to 
start the process in that cgroup.

> Potential race condition in setting GPU cgroups & execute command in the 
> selected cgroup
> 
>
> Key: YARN-9294
> URL: https://issues.apache.org/jira/browse/YARN-9294
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0
>Reporter: Keqiu Hu
>Assignee: Keqiu Hu
>Priority: Critical
>
> Environment is latest branch-2 head
> OS: RHEL 7.4
> *Observation*
> Out of ~10 container allocations with GPU requirement, at least 1 of the 
> allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I 
> could still have visibility to all GPUs on the same machine when running 
> nvidia-smi.
> The funny thing is even though I have visibility to all GPUs at the moment of 
> executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the 
> process's access to only that single GPU after sometime. 
> The underlying process trying to access GPU would take the initial 
> information as source of truth and try to access physical 0 GPU which is not 
> really available to the process. This results in a 
> [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error.
> Validated the container-executor commands are correct:
> {code:java}
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, 
> --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, 
> 0,1,2,3]
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, 
> application_1549663278916_0249, 
> /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, 
> /grid/a/tmp/yarn, /grid/a/tmp/userlogs, 
> /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer,
>  khu, application_1549663278916_0249, 
> container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, 
> 8040, /grid/a/tmp/yarn]
> {code}
> So most likely a race condition between these two operations? 
> cc [~jhung]
> Another potential theory is the cgroups creation for the container actually 
> failed but the error was swallowed silently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-02-12 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766185#comment-16766185
 ] 

Adam Antal commented on YARN-9270:
--

Thanks for the patch, [~pbacsko]! Some thought of mine:

 - If we touch {{IntelFpgaOpenclPlugin.java}}, could we remove the wildcard 
import {{import java.util.*}}. If I'm not mistaken, we use HashMap, LinkedList, 
List and Map in that file. (similar to TestFpgaDiscoverer.java).
 - Removing that dirty hack from {{TestFpgaDiscoverer.java}} is a great plus of 
this patch, thank you for that! I find the exact same piece of code in SO by 
searching the keywords (it's a red flag for me), and it's looks really messy, 
so I am happy if we can get this removed.
 - I don't see why the constructor of Configuration is called with false, but I 
can accept that. Also the 5th testcase 
(testLinuxFpgaResourceDiscoverPluginWithSdkRootSet) uses another Conifiguration 
object in the original testcase when calling the 
{{discoverer.initialize(conf)}} (which is initialized with a true parameter) - 
so you modify the behaviour of the testcase. It doesn't make the test fail, but 
is it intentional?
 - We request the instance of the FpgaDiscoverer 5 times, and then call the 
setResourceHanderPlugin on it with the same parameter (openclPlugin). Can we 
move it to a helper function to avoid minor code duplication?
 - We can also move the setting of YarnConfiguration.NM_FPGA_PATH_TO_EXEC 
config to that function, if we don't modify the 1st test behaviour.
 - Also could you move the previous comments/description of the test cases to 
the new tests' javadoc?
 - As I see it, there aren't any logs defined in this testcase. It is beyond 
the scope of the issue, but it would be nice to have some debug level logging. 
For a start it'd be nice to have log for the new tests that you just split.

Was happy to review it, good work overall!

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9268) Various fixes are needed in FpgaDevice

2019-02-12 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9268:
---
Attachment: YARN-9268-003.patch

> Various fixes are needed in FpgaDevice
> --
>
> Key: YARN-9268
> URL: https://issues.apache.org/jira/browse/YARN-9268
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9268-001.patch, YARN-9268-002.patch, 
> YARN-9268-003.patch
>
>
> Need to fix the following the class {{FpgaDevice}}:
>  * It implements {{Comparable}}, but returns 0 in every case. There is no 
> natural ordering among FPGA devices, perhaps "acl0" comes before "acl1", but 
> this seems too forced and unnecessary.We think this class should not 
> implement {{Comparable}} at all, at least not like that.
>  * Stores unnecessary fields: devName, busNum, temperature, power usage. For 
> one, these are never needed in the code. Secondly, temp and power usage 
> changes constantly. It's pointless to store these in this POJO.
>  * {{serialVersionUID}} is 1L - let's generate a number for this
>  * Use {{int}} instead of {{Integer}} - don't allow nulls. If major/minor 
> uniquely identifies the card, then let's demand them in the constructor and 
> don't store Integers that can be null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766143#comment-16766143
 ] 

Zhaohui Xin commented on YARN-9277:
---

Hi, [~yufeigu]. Can you help me review this patch? :D

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin as an example

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766109#comment-16766109
 ] 

Zhankun Tang commented on YARN-9060:


[~sunilg] , the failed test cases seems not related. Please review. Thanks.

> [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin 
> as an example
> --
>
> Key: YARN-9060
> URL: https://issues.apache.org/jira/browse/YARN-9060
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch, 
> YARN-9060-trunk.003.patch, YARN-9060-trunk.004.patch, 
> YARN-9060-trunk.005.patch, YARN-9060-trunk.006.patch, 
> YARN-9060-trunk.007.patch, YARN-9060-trunk.008.patch, 
> YARN-9060-trunk.009.patch, YARN-9060-trunk.010.patch, 
> YARN-9060-trunk.011.patch, YARN-9060-trunk.012.patch, 
> YARN-9060-trunk.013.patch, YARN-9060-trunk.014.patch, 
> YARN-9060-trunk.015.patch, YARN-9060-trunk.016.patch, 
> YARN-9060-trunk.017.patch, YARN-9060-trunk.018.patch
>
>
> Due to the cgroups v1 implementation policy in linux kernel, we cannot update 
> the value of the device cgroups controller unless we have the root permission 
> ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]).
>  So we need to support this in container-executor for Java layer to invoke.
> This Jira will have three parts:
>  # native c-e module
>  # Java layer code to isolate devices for container (docker and non-docker)
>  # A sample Nvidia GPU plugin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Attachment: YARN-9277.002.patch

> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch, YARN-9277.002.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9277) Add more restrictions In FairScheduler Preemption

2019-02-12 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9277:
--
Description: 
 

I think we should add more restrictions in fair scheduler preemption. 
 * We should not preempt self
 * We should not preempt high priority job
 * We should not preempt container which has been running for a long time.
 * ...

  was:
 

I think we should add more restrictions in fair scheduler preemption. 
 * We should not preempt AM container
 * We should not preempt high priority job
 * We should not preempt container which has been running for a long time.
 * ...


> Add more restrictions In FairScheduler Preemption 
> --
>
> Key: YARN-9277
> URL: https://issues.apache.org/jira/browse/YARN-9277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9277.001.patch
>
>
>  
> I think we should add more restrictions in fair scheduler preemption. 
>  * We should not preempt self
>  * We should not preempt high priority job
>  * We should not preempt container which has been running for a long time.
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9266) Various fixes are needed in IntelFpgaOpenclPlugin

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766045#comment-16766045
 ] 

Hadoop QA commented on YARN-9266:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 54 unchanged - 94 fixed = 54 total (was 148) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
35s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9266 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958382/YARN-9266-004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 441a9eb09e39 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 20b92cd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23386/testReport/ |
| Max. process+thread count | 413 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23386/console |
| Powered by |

[jira] [Commented] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766040#comment-16766040
 ] 

Hadoop QA commented on YARN-9265:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  8m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  4m  
1s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 264 unchanged - 6 fixed = 264 total (was 270) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 20s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
42s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
34s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
39s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}112m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9265 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958378/YARN-9265-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux 49c58552fd88 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86

[jira] [Commented] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin as an example

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766012#comment-16766012
 ] 

Hadoop QA commented on YARN-9060:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  7m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 48s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 15s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}226m 28s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
|   | hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hado

[jira] [Comment Edited] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766008#comment-16766008
 ] 

Zhaohui Xin edited comment on YARN-8655 at 2/12/19 1:09 PM:


[~wilfreds], I accidentally discovered this problem in our production cluster 
about a few months ago. *I think it's enough to satisfy fair share starvation, 
so I removed min share starvation to fix this problem finally.* 

I just learned that the community will also abolish min share in the future. 
After YARN-9066, this issue will no longer be needed.

Thanks for your reply. :D


was (Author: uranus):
[~wilfreds], I accidentally discovered this problem in our production cluster 
about a few months ago. *I think it's enough to satisfy fair share starvation, 
so I removed min share starvation to fix this problem finally.* 

I just learned that the community will also abolish this in the future. After 
[YARN-9066|https://issues.apache.org/jira/browse/YARN-9066], this issue will no 
longer be needed.

Thanks for your reply. :D

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766008#comment-16766008
 ] 

Zhaohui Xin commented on YARN-8655:
---

[~wilfreds], I accidentally discovered this problem in our production cluster 
about a few months ago. *I think it's enough to satisfy fair share starvation, 
so I removed min share starvation to fix this problem finally.* 

I just learned that the community will also abolish this in the future. After 
[YARN-9066|https://issues.apache.org/jira/browse/YARN-9066], this issue will no 
longer be needed.

Thanks for your reply. :D

> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9066) Deprecate Fair Scheduler min share

2019-02-12 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766003#comment-16766003
 ] 

Zhaohui Xin commented on YARN-9066:
---

[~wilfreds], [~haibochen]. I agree with you very much. It's very complicated to 
understand min share starvation. After we remove min share starvation, 
[YARN-8655|https://issues.apache.org/jira/browse/YARN-8655] will no longer be 
needed.

> Deprecate Fair Scheduler min share
> --
>
> Key: YARN-9066
> URL: https://issues.apache.org/jira/browse/YARN-9066
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Haibo Chen
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: Proposal_Deprecate_FS_Min_Share.pdf
>
>
> See the attached docs for details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9266) Various fixes are needed in IntelFpgaOpenclPlugin

2019-02-12 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9266:
---
Attachment: YARN-9266-004.patch

> Various fixes are needed in IntelFpgaOpenclPlugin
> -
>
> Key: YARN-9266
> URL: https://issues.apache.org/jira/browse/YARN-9266
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9266-001.patch, YARN-9266-002.patch, 
> YARN-9266-003.patch, YARN-9266-004.patch
>
>
> Problems identified in this class:
>  * {{InnerShellExecutor}} ignores the timeout parameter
>  * {{configureIP()}} uses printStackTrace() instead of logging
>  * {{configureIP()}} does not log the output of aocl if the exit code != 0
>  * {{parseDiagnoseInfo()}} is too heavyweight – it should be in its own class 
> for better testability
>  * {{downloadIP()}} uses {{contains()}} for file name check – this can really 
> surprise users in some cases (eg. you want to use hello.aocx but hello2.aocx 
> also matches)
>  * method name {{downloadIP()}} is misleading – it actually tries to finds 
> the file. Everything is downloaded (localized) at this point.
>  * {{@VisibleForTesting}} methods should be package private
>  * {{aliasMap}} is not needed - store the acl number in the {{FpgaDevice}} 
> class



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9286) Timeline Server(1.5) GUI, sorting based on FinalStatus throws pop-up message

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765968#comment-16765968
 ] 

Hadoop QA commented on YARN-9286:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
33s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9286 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958208/YARN-9286-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 65269be42858 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 20b92cd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23384/testReport/ |
| Max. process+thread count | 412 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23384/console |
| Powered by | Apache Yetus 0.8.0   http://yetus

[jira] [Created] (YARN-9296) In Timline Server(1.5), FinalStatus is displayed wrong for killed and failed applications

2019-02-12 Thread Nallasivan (JIRA)
Nallasivan created YARN-9296:


 Summary: In Timline Server(1.5), FinalStatus is displayed wrong 
for killed and failed applications
 Key: YARN-9296
 URL: https://issues.apache.org/jira/browse/YARN-9296
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Nallasivan


In Timline Server(1.5), FinalStatus of the applications which are killed and 
failed, is displayed as UNDEFINED in both GUI, REST API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9295) Fix 'Decomissioned' label typo in Cluster Overview page

2019-02-12 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765964#comment-16765964
 ] 

Bibin A Chundatt commented on YARN-9295:


Thank you [~charanh] for patch

Looks good to me .. Will get this in today..

> Fix 'Decomissioned' label typo in Cluster Overview page
> ---
>
> Key: YARN-9295
> URL: https://issues.apache.org/jira/browse/YARN-9295
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Charan Hebri
>Assignee: Charan Hebri
>Priority: Trivial
> Attachments: Decommissioned-typo.png, YARN-9295.001.patch
>
>
> Change label text from 'Decomissioned' to 'Decommissioned' in Node Managers 
> section of the Cluster Overview page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card

2019-02-12 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9265:
---
Attachment: YARN-9265-002.patch

> FPGA plugin fails to recognize Intel Processing Accelerator Card
> 
>
> Key: YARN-9265
> URL: https://issues.apache.org/jira/browse/YARN-9265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-9265-001.patch, YARN-9265-002.patch
>
>
> The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card).
> There are two major issues.
> Problem #1
> The output of aocl diagnose:
> {noformat}
> 
> Device Name:
> acl0
>  
> Package Pat:
> /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp
>  
> Vendor: Intel Corp
>  
> Physical Dev Name   StatusInformation
>  
> pac_a10_f20 PassedPAC Arria 10 Platform (pac_a10_f20)
>   PCIe 08:00.0
>   FPGA temperature = 79 degrees C.
>  
> DIAGNOSTIC_PASSED
> 
>  
> Call "aocl diagnose " to run diagnose for specified devices
> Call "aocl diagnose all" to run diagnose for all devices
> {noformat}
> The plugin fails to recognize this and fails with the following message:
> {noformat}
> 2019-01-25 06:46:02,834 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin:
>  Using FPGA vendor plugin: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin
> 2019-01-25 06:46:02,943 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer:
>  Trying to diagnose FPGA information ...
> 2019-01-25 06:46:03,085 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule:
>  Using traffic control bandwidth handler
> 2019-01-25 06:46:03,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn
> 2019-01-25 06:46:03,139 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl:
>  FPGA Plugin bootstrap success.
> 2019-01-25 06:46:03,247 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)bus:slot.func\s=\s.*, pattern
> 2019-01-25 06:46:03,248 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern
> 2019-01-25 06:46:03,251 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Failed to get major-minor number from reading /dev/pac_a10_f30
> 2019-01-25 06:46:03,252 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to 
> bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  No FPGA devices detected!
> {noformat}
> Problem #2
> The plugin assumes that the file name under {{/dev}} can be derived from the 
> "Physical Dev Name", but this is wrong. For example, it thinks that the 
> device file is {{/dev/pac_a10_f30}} which is not the case, the actual 
> file is {{/dev/intel-fpga-port.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe

2019-02-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765949#comment-16765949
 ] 

Wilfred Spiegelenburg commented on YARN-8655:
-

Hi [~uranus], I am not saying that what we do now is 100% correct. I am only 
doubting how often this occurs and what the impact on the application and 
scheduling activities is. Based on the analysis I did I think we need a 
solution for this case that has far less impact. Do we know any of the 
following:
How badly does it affect the running applications, do we pre-empt double what 
we should? 
Does not handling this correctly slow down pre-emption? 
Is there another impact of not handling the edge case?

Pre-emption currently runs almost continually and is gated by the {{take()}}: 
when there is a pre-emption waiting we handle it. The patch changes this into 
one pre-emption per second. It effectively throttles down the pre-emption from 
processing applications based on their arrival to slow scheduled trickle.
When I look at how we calculate and decide if the application is marked as 
minimum share starved the cases should be limited. Even if the application is 
fair share starved and the queue is min share starved we do not automatically 
mark the application as min share starved. We thus only have this edge case for 
a small number of applications.
Fixing that edge case by slowing down all pre-emption handling is what I think 
is not right.


> FairScheduler: FSStarvedApps is not thread safe
> ---
>
> Key: YARN-8655
> URL: https://issues.apache.org/jira/browse/YARN-8655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-8655.002.patch, YARN-8655.patch
>
>
> *FSStarvedApps is not thread safe, this may make one starve app is processed 
> for two times continuously.*
> For example, when app1 is *fair share starved*, it has been added to 
> appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
> update to app1. At the moment, app1 is *starved by min share*, so this app 
> is added to appsToProcess again! Because appBeingProcessed is null and 
> appsToProcess also have not this one. 
> {code:java}
> void addStarvedApp(FSAppAttempt app) {
> if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) {
> appsToProcess.add(app);
> }
> }
> FSAppAttempt take() throws InterruptedException {
>   // Reset appBeingProcessed before the blocking call
>   appBeingProcessed = null;
>   // Blocking call to fetch the next starved application
>   FSAppAttempt app = appsToProcess.take();
>   appBeingProcessed = app;
>   return app;
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9123) Clean up and split testcases in TestNMWebServices for GPU support

2019-02-12 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765911#comment-16765911
 ] 

Adam Antal commented on YARN-9123:
--

Hi [~snemeth]. Thanks for the patch! The cleaning looks good, I have some minor 
comments to that.
 - The following piece of code is replicated, as you have split the test into 3 
parts:

{code:java}
assertEquals("MediaType of the response is not the expected!",
  MediaType.APPLICATION_JSON + "; " + JettyUtils.UTF_8,
  response.getType().toString());
json = response.getEntity(JSONObject.class);
Assert.assertEquals(1000, json.get("a"));

JSONObject json = response.getEntity(JSONObject.class);
assertEquals("Unexpected value in the json response!",
  0, json.length());  
{code}
Consider putting it to a separate function, and calling that in order to avoid 
minor code duplication.
 - Though I'm not an expert on this area, it seems strange that there is no 
logging at all (does not even exist a logging class). Though it would go beyond 
the scope of this jira, I recommend inserting a log object and a few debug 
logging: like successfully setting up mocks, respond successfully received. 
(Only for the tests you marked for refactoring).
 - The Test testGetNMResourceInfoFailBecauseOfUnknownPlugin is a bit lengthy: 
47 character. Though it's completely allowed, we can name it to a bit shorter 
one for better readability. There aren't any javadocs in the nearby testcases, 
but you can also move pieces of information about the test to javadoc.

> Clean up and split testcases in TestNMWebServices for GPU support
> -
>
> Key: YARN-9123
> URL: https://issues.apache.org/jira/browse/YARN-9123
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-9123.001.patch, YARN-9123.002.patch, 
> YARN-9123.003.patch, YARN-9123.004.patch
>
>
> The following testcases can be cleaned up a bit: 
> TestNMWebServices#testGetNMResourceInfo - Can be split up to 3 different cases
> TestNMWebServices#testGetYarnGpuResourceInfo



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup

2019-02-12 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765895#comment-16765895
 ] 

Zhankun Tang commented on YARN-9294:


[~oliverhuh...@gmail.com],

Have you tried to do some manual cgroup isolation test without YARN to 
reproduce it?

Like create a directory under /sys/fs/cgroup/devices/hadoop-yarn and echo the 
values to cgroup device deny file and verify if the process is isolated as 
expected repeatedly?

I used to verify cgroup parameter with cgget and cgdelete. The tools can be 
installed by 
{code:java}
yum install libcgroup
yum install libcgroup-tools
cgget -r memory.limit_in_bytes -g 
memory:hadoop-yarn/container_1542945107795_0003_01_02{code}
 

But I just verified in my Ubuntu VM, the "cgget" cannot show the denied devices 
although the isolation is working. Maybe the "cgget" also has this bug in REL.

>From your description, it seems we should dig with reproducing the flaky GPU 
>isolation first? Or try different OS kernel version?

 

> Potential race condition in setting GPU cgroups & execute command in the 
> selected cgroup
> 
>
> Key: YARN-9294
> URL: https://issues.apache.org/jira/browse/YARN-9294
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0
>Reporter: Keqiu Hu
>Assignee: Keqiu Hu
>Priority: Critical
>
> Environment is latest branch-2 head
> OS: RHEL 7.4
> *Observation*
> Out of ~10 container allocations with GPU requirement, at least 1 of the 
> allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I 
> could still have visibility to all GPUs on the same machine when running 
> nvidia-smi.
> The funny thing is even though I have visibility to all GPUs at the moment of 
> executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the 
> process's access to only that single GPU after sometime. 
> The underlying process trying to access GPU would take the initial 
> information as source of truth and try to access physical 0 GPU which is not 
> really available to the process. This results in a 
> [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error.
> Validated the container-executor commands are correct:
> {code:java}
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, 
> --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, 
> 0,1,2,3]
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, 
> application_1549663278916_0249, 
> /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, 
> /grid/a/tmp/yarn, /grid/a/tmp/userlogs, 
> /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer,
>  khu, application_1549663278916_0249, 
> container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, 
> 8040, /grid/a/tmp/yarn]
> {code}
> So most likely a race condition between these two operations? 
> cc [~jhung]
> Another potential theory is the cgroups creation for the container actually 
> failed but the error was swallowed silently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9290) Invalid SchedulingRequest not rejected in Scheduler PlacementConstraintsHandler

2019-02-12 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765890#comment-16765890
 ] 

Prabhu Joseph commented on YARN-9290:
-

[~cheersyang] Can you review the patch for this Jira when you get time. 

Placement Processor rejects the Invalid Scheduling Request after configured 
retry attempts and add it to AllocateResponse RejectedSchedulingRequest set. 
The Scheduler Processor does not do that instead both AM and RM keeps on trying 
to allocate & place the invalid request.


> Invalid SchedulingRequest not rejected in Scheduler 
> PlacementConstraintsHandler 
> 
>
> Key: YARN-9290
> URL: https://issues.apache.org/jira/browse/YARN-9290
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9290-001.patch, YARN-9290-002.patch, 
> YARN-9290-003.patch
>
>
> SchedulingRequest with Invalid namespace is not rejected in Scheduler  
> PlacementConstraintsHandler. RM keeps on trying to allocateOnNode with 
> logging the exception. This is rejected in case of placement-processor 
> handler.
> {code}
> 2019-02-08 16:51:27,548 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator:
>  Failed to query node cardinality:
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.InvalidAllocationTagsQueryException:
>  Invalid namespace prefix: notselfi, valid values are: 
> all,not-self,app-id,app-tag,self
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TargetApplicationsNamespace.fromString(TargetApplicationsNamespace.java:277)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TargetApplicationsNamespace.parse(TargetApplicationsNamespace.java:234)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.AllocationTags.createAllocationTags(AllocationTags.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfySingleConstraintExpression(PlacementConstraintsUtil.java:78)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfySingleConstraint(PlacementConstraintsUtil.java:240)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:321)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyAndConstraint(PlacementConstraintsUtil.java:272)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:324)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.PlacementConstraintsUtil.canSatisfyConstraints(PlacementConstraintsUtil.java:365)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.checkCardinalityAndPending(SingleConstraintAppPlacementAllocator.java:355)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.SingleConstraintAppPlacementAllocator.precheckNode(SingleConstraintAppPlacementAllocator.java:395)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.precheckNode(AppSchedulingInfo.java:779)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.preCheckForNodeCandidateSet(RegularContainerAllocator.java:145)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.allocate(RegularContainerAllocator.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator.assignContainers(RegularContainerAllocator.java:890)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.ContainerAllocator.assignContainers(ContainerAllocator.java:54)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.assignContainers(FiCaSchedulerApp.java:977)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:1173)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:795)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySch

[jira] [Commented] (YARN-9293) Optimize MockAMLauncher event handling

2019-02-12 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765877#comment-16765877
 ] 

Bibin A Chundatt commented on YARN-9293:


trunk test failure is not related to patch attached. TC are passing locally

[~sunilg] like to review latest one ?? 

> Optimize MockAMLauncher event handling
> --
>
> Key: YARN-9293
> URL: https://issues.apache.org/jira/browse/YARN-9293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
>  Labels: simulator
> Attachments: YARN-9293-branch-3.1.003.patch, YARN-9293.001.patch, 
> YARN-9293.002.patch, YARN-9293.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9293) Optimize MockAMLauncher event handling

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765874#comment-16765874
 ] 

Hadoop QA commented on YARN-9293:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
34s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} hadoop-tools/hadoop-sls: The patch generated 0 new + 
53 unchanged - 1 fixed = 53 total (was 54) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 13s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.sls.TestSLSRunner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:080e9d0 |
| JIRA Issue | YARN-9293 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958358/YARN-9293-branch-3.1.003.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c3053e92f31e 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / f3c1e45 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23382/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23382/testReport/ |
| Max. process+thread count | 459 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job/PreComm

[jira] [Commented] (YARN-9290) Invalid SchedulingRequest not rejected in Scheduler PlacementConstraintsHandler

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765871#comment-16765871
 ] 

Hadoop QA commented on YARN-9290:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  3s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 44s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 2 new + 620 unchanged - 3 fixed = 622 total (was 623) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 23s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958353/YARN-9290-003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 879f5056cfdb 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d48e61d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23380/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23380/artifact/out/patch-unit-hadoop-yar

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin as an example

2019-02-12 Thread Zhankun Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-9060:
---
Attachment: YARN-9060-trunk.018.patch

> [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin 
> as an example
> --
>
> Key: YARN-9060
> URL: https://issues.apache.org/jira/browse/YARN-9060
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch, 
> YARN-9060-trunk.003.patch, YARN-9060-trunk.004.patch, 
> YARN-9060-trunk.005.patch, YARN-9060-trunk.006.patch, 
> YARN-9060-trunk.007.patch, YARN-9060-trunk.008.patch, 
> YARN-9060-trunk.009.patch, YARN-9060-trunk.010.patch, 
> YARN-9060-trunk.011.patch, YARN-9060-trunk.012.patch, 
> YARN-9060-trunk.013.patch, YARN-9060-trunk.014.patch, 
> YARN-9060-trunk.015.patch, YARN-9060-trunk.016.patch, 
> YARN-9060-trunk.017.patch, YARN-9060-trunk.018.patch
>
>
> Due to the cgroups v1 implementation policy in linux kernel, we cannot update 
> the value of the device cgroups controller unless we have the root permission 
> ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]).
>  So we need to support this in container-executor for Java layer to invoke.
> This Jira will have three parts:
>  # native c-e module
>  # Java layer code to isolate devices for container (docker and non-docker)
>  # A sample Nvidia GPU plugin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9293) Optimize MockAMLauncher event handling

2019-02-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765844#comment-16765844
 ] 

Hadoop QA commented on YARN-9293:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} hadoop-tools/hadoop-sls: The patch generated 0 new + 
53 unchanged - 1 fixed = 53 total (was 54) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  3s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 12s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.sls.TestSLSStreamAMSynth |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9293 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958355/YARN-9293.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5a43a3c87739 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d48e61d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23381/artifact/out/patch-unit-hadoop-tools_hadoop-sls.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23381/testReport/ |
| Max. process+thread count | 470 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-sls U: hadoop-tools/hadoop-sls |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23381/console |
| Powered by | 

[jira] [Commented] (YARN-9252) Allocation Tag Namespace support in Distributed Shell

2019-02-12 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765799#comment-16765799
 ] 

Weiwei Yang commented on YARN-9252:
---

Just cherry picked to branch-3.2, updated fixed version.

> Allocation Tag Namespace support in Distributed Shell
> -
>
> Key: YARN-9252
> URL: https://issues.apache.org/jira/browse/YARN-9252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: 3.1.1
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9252-001.patch, YARN-9252-002.patch, 
> YARN-9252-003.patch, YARN-9252-004.patch
>
>
> Distributed Shell supports placement constraint but allocation Tag Namespace 
> is not honored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9252) Allocation Tag Namespace support in Distributed Shell

2019-02-12 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9252:
--
Fix Version/s: 3.2.1

> Allocation Tag Namespace support in Distributed Shell
> -
>
> Key: YARN-9252
> URL: https://issues.apache.org/jira/browse/YARN-9252
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: 3.1.1
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: YARN-9252-001.patch, YARN-9252-002.patch, 
> YARN-9252-003.patch, YARN-9252-004.patch
>
>
> Distributed Shell supports placement constraint but allocation Tag Namespace 
> is not honored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9253) Add UT to verify Placement Constraint in Distributed Shell

2019-02-12 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9253:
--
Fix Version/s: 3.1.3
   3.2.1

> Add UT to verify Placement Constraint in Distributed Shell
> --
>
> Key: YARN-9253
> URL: https://issues.apache.org/jira/browse/YARN-9253
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9253-001.patch, YARN-9253-002.patch, 
> YARN-9253-003.patch, YARN-9253-004.patch, YARN-9253-005.patch
>
>
> Add UT to verify Placement Constraint in Distributed Shell added as part of 
> YARN-7745



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9253) Add UT to verify Placement Constraint in Distributed Shell

2019-02-12 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765790#comment-16765790
 ] 

Weiwei Yang commented on YARN-9253:
---

Just cherry picked to branch-3.2 and branch-3.1 as well.

> Add UT to verify Placement Constraint in Distributed Shell
> --
>
> Key: YARN-9253
> URL: https://issues.apache.org/jira/browse/YARN-9253
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9253-001.patch, YARN-9253-002.patch, 
> YARN-9253-003.patch, YARN-9253-004.patch, YARN-9253-005.patch
>
>
> Add UT to verify Placement Constraint in Distributed Shell added as part of 
> YARN-7745



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9293) Optimize MockAMLauncher event handling

2019-02-12 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765788#comment-16765788
 ] 

Bibin A Chundatt commented on YARN-9293:


Attached branch-3.1 and update trunk patch

> Optimize MockAMLauncher event handling
> --
>
> Key: YARN-9293
> URL: https://issues.apache.org/jira/browse/YARN-9293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Attachments: YARN-9293-branch-3.1.003.patch, YARN-9293.001.patch, 
> YARN-9293.002.patch, YARN-9293.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9191) Add cli option in DS to support enforceExecutionType in resource requests.

2019-02-12 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-9191:
--
Fix Version/s: 3.1.3

> Add cli option in DS to support enforceExecutionType in resource requests.
> --
>
> Key: YARN-9191
> URL: https://issues.apache.org/jira/browse/YARN-9191
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9191.001.patch, YARN-9191.002.patch, 
> YARN-9191.003.patch, YARN-9191.004.patch, YARN-9191.005.patch, 
> YARN-9191.006.patch
>
>
> This JIRA proposes to expose cli option to allow users to additionally 
> specify the value enforceExecutionType flag (introduced in YARN-5180).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >