date:20180711

[jira] [Commented] (YARN-8505) AMLimit and userAMLimit check should be skipped for unmanaged AM

2018-07-11 Thread Bibin A Chundatt (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541157#comment-16541157
 ] 

Bibin A Chundatt commented on YARN-8505:


[~leftnoteasy]

+1 for adding a new configuration limiting number of RUNNING application for 
queue. I was having the same idea too.

> AMLimit and userAMLimit check should be skipped for unmanaged AM
> 
>
> Key: YARN-8505
> URL: https://issues.apache.org/jira/browse/YARN-8505
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8505.001.patch
>
>
> AMLimit and userAMLimit check in LeafQueue#activateApplications should be 
> skipped for unmanaged AM whose resource is not taken from YARN cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8480) Add boolean option for resources

2018-07-11 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541156#comment-16541156
 ] 

Wangda Tan commented on YARN-8480:
--

[~cheersyang], 

What [~templedf] / [~snemeth] proposed is make the resource like a label, for 
example, node can report it has resource: memory=2048,vcore=3,has_java=true. 
And AM can request resource with ...has_java=true. Allocating container on the 
node will not update node's available resource, as far as the node has enough 
memory and vcores, it can allocate >1 containers with has_java=true resource 
request. This is an alternative way to represent node label by using resource 
types.

> Add boolean option for resources
> 
>
> Key: YARN-8480
> URL: https://issues.apache.org/jira/browse/YARN-8480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Daniel Templeton
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8480.001.patch, YARN-8480.002.patch
>
>
> Make it possible to define a resource with a boolean value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8505) AMLimit and userAMLimit check should be skipped for unmanaged AM

2018-07-11 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541155#comment-16541155
 ] 

Wangda Tan commented on YARN-8505:
--

[~bibinchundatt]  / [~Tao Yang] / [~cheersyang], 

I would prefer to add a new limit to container 
#maximum-concurrently-activated-apps within a queue and skip updating AMLimit 
when a unmanaged AM is launched.

> AMLimit and userAMLimit check should be skipped for unmanaged AM
> 
>
> Key: YARN-8505
> URL: https://issues.apache.org/jira/browse/YARN-8505
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8505.001.patch
>
>
> AMLimit and userAMLimit check in LeafQueue#activateApplications should be 
> skipped for unmanaged AM whose resource is not taken from YARN cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8511) When AM releases a container, RM removes allocation tags before it is released by NM

2018-07-11 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541153#comment-16541153
 ] 

Wangda Tan commented on YARN-8511:
--

[~cheersyang], 

Thanks for reporting and working on this issue, this is valid issue, and we saw 
it from other places.  

For example, when exclusive use resource types like GPU, we could allocate and 
container to a node before the previous container completed. Memory has the 
same issue.

I'm not sure if your patch works since the {{SchedulerNode#releaseContainer}} 
could be invoked in scenarios like when an AM release container by invoking 
allocate call, or app attempt finishes. Scheduler could still place a new 
container on a node before it terminated by NM.

Instead, I think we should have some hook to handle such event inside 
{{AbstractYarnScheduler#nodeUpdate}}.

However we still have two issues: 
1) If we deduct resource after actual container finishes, it is possible that 
scheduler application attempt already finished. In that case, scheduler is not 
able to deduct resources. (Scheduler relies on SchedulerApplicationAttempt to 
locate RMContainer). I'm not sure if it impacts allocation tags or not.
2) It is also possible that NM spend too much time on terminating containers, 
in our docker-in-docker setup, we observed OS takes several minutes to 
terminate container. And NM could report container is DONE before it is 
actually terminated. (Another bug here). YARN-8508 is caused by the issue.
 
 

> When AM releases a container, RM removes allocation tags before it is 
> released by NM
> 
>
> Key: YARN-8511
> URL: https://issues.apache.org/jira/browse/YARN-8511
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8511.001.patch, YARN-8511.002.patch
>
>
> User leverages PC with allocation tags to avoid port conflicts between apps, 
> we found sometimes they still get port conflicts. This is a similar issue 
> like YARN-4148. Because RM immediately removes allocation tags once 
> AM#allocate asks to release a container, however container on NM has some 
> delay until it actually gets killed and released the port. We should let RM 
> remove allocation tags AFTER NM confirms the containers are released. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7064) Use cgroup to get container resource utilization

2018-07-11 Thread yewei.huang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541092#comment-16541092
 ] 

yewei.huang commented on YARN-7064:
---

{{Thanks [~miklos.szeg...@cloudera.com] for such nice feature!!!}}

{{As a newBee for yarn, I've got a little confusion on why we choose to add up 
(user + sys) time in cpuacct.stat rather than use cpuacct.usage when try to get 
total cpu usage ? }}

{{From the [kernel 
doc|https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt]}}

{{cpuacct.usage }}gives the CPU time (in nanoseconds) obtained by this group 
which is essentially the CPU time obtained by all the tasks in the system.

cpuacct.stat  lists a few statistics which further divide the CPU time (in 
USER_HZ unit) obtained by the cgroup into user and system times. 

And it also mentioned cpuacct controller uses percpu_counter interface to 
collect user and system times. This has two side effects:
 * It is theoretically possible to see wrong values for user and system times. 
This is because percpu_counter_read() on 32bit systems isn't safe against 
concurrent writes.
 * It is possible to see slightly outdated values for user and system times due 
to the batch processing nature of percpu_counter.

 

seems much safer to use cpuacct.usage?

 

> Use cgroup to get container resource utilization
> 
>
> Key: YARN-7064
> URL: https://issues.apache.org/jira/browse/YARN-7064
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: YARN-7064.000.patch, YARN-7064.001.patch, 
> YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch, 
> YARN-7064.005.patch, YARN-7064.007.patch, YARN-7064.008.patch, 
> YARN-7064.009.patch, YARN-7064.010.patch, YARN-7064.011.patch, 
> YARN-7064.012.patch, YARN-7064.013.patch, YARN-7064.014.patch
>
>
> This is an addendum to YARN-6668. What happens is that that jira always wants 
> to rebase patches against YARN-1011 instead of trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7064) Use cgroup to get container resource utilization

2018-07-11 Thread yewei.huang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541092#comment-16541092
 ] 

yewei.huang edited comment on YARN-7064 at 7/12/18 3:56 AM:


{{Thanks [~miklos.szeg...@cloudera.com] for such nice feature!!!}}

{{As a newBee for yarn, I've got a little confusion on why we choose to add up 
(user + sys) time in cpuacct.stat rather than use cpuacct.usage when try to get 
total cpu usage ? }}

{{From the [kernel 
doc|https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt]}}

cpuacct.usage gives the CPU time (in nanoseconds) obtained by this group which 
is essentially the CPU time obtained by all the tasks in the system.

cpuacct.stat  lists a few statistics which further divide the CPU time (in 
USER_HZ unit) obtained by the cgroup into user and system times. 

And it also mentioned cpuacct controller uses percpu_counter interface to 
collect user and system times. This has two side effects:
 * It is theoretically possible to see wrong values for user and system times. 
This is because percpu_counter_read() on 32bit systems isn't safe against 
concurrent writes.
 * It is possible to see slightly outdated values for user and system times due 
to the batch processing nature of percpu_counter.

 

seems much safer to use cpuacct.usage?

 


was (Author: windwizard):
{{Thanks [~miklos.szeg...@cloudera.com] for such nice feature!!!}}

{{As a newBee for yarn, I've got a little confusion on why we choose to add up 
(user + sys) time in cpuacct.stat rather than use cpuacct.usage when try to get 
total cpu usage ? }}

{{From the [kernel 
doc|https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt]}}

{{cpuacct.usage }}gives the CPU time (in nanoseconds) obtained by this group 
which is essentially the CPU time obtained by all the tasks in the system.

cpuacct.stat  lists a few statistics which further divide the CPU time (in 
USER_HZ unit) obtained by the cgroup into user and system times. 

And it also mentioned cpuacct controller uses percpu_counter interface to 
collect user and system times. This has two side effects:
 * It is theoretically possible to see wrong values for user and system times. 
This is because percpu_counter_read() on 32bit systems isn't safe against 
concurrent writes.
 * It is possible to see slightly outdated values for user and system times due 
to the batch processing nature of percpu_counter.

 

seems much safer to use cpuacct.usage?

 

> Use cgroup to get container resource utilization
> 
>
> Key: YARN-7064
> URL: https://issues.apache.org/jira/browse/YARN-7064
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: YARN-7064.000.patch, YARN-7064.001.patch, 
> YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch, 
> YARN-7064.005.patch, YARN-7064.007.patch, YARN-7064.008.patch, 
> YARN-7064.009.patch, YARN-7064.010.patch, YARN-7064.011.patch, 
> YARN-7064.012.patch, YARN-7064.013.patch, YARN-7064.014.patch
>
>
> This is an addendum to YARN-6668. What happens is that that jira always wants 
> to rebase patches against YARN-1011 instead of trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4104) dryrun of schedule for diagnostic and tenant's complain

2018-07-11 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541087#comment-16541087
 ] 

Weiwei Yang commented on YARN-4104:
---

I like this idea too, but seems this one gets to obsolete :(

 

> dryrun of schedule for diagnostic and tenant's complain
> ---
>
> Key: YARN-4104
> URL: https://issues.apache.org/jira/browse/YARN-4104
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
>
> We have more than 1 thousand queues and several hundreds of tenants in a busy 
> cluster. We get a lot of complains/questions from owner/operator of queues 
> about "Why my queue/app can't get resource for a long while? "
> It's really hard to answer such questions.
> So we added a diagnostic REST endpoint 
> "/ws/v1/cluster/schedule/dryrun/{parentQueueName}" which returns the sorted 
> list of it's children according to it's SchedulingPolicy.getComparator().  
> All scheduling parameters of the children are also displayed, such as 
> minShare, usage, demand, weight, priority etc.
> Usually we just call "/ws/v1/cluster/schedule/dryrun/root", and the result 
> self-explains to the questions.
> I feel it's really useful for multi-tenant clusters, and hope it could be 
> merged into the mainline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-4192) Add YARN metric logging periodically to a seperate file

2018-07-11 Thread Weiwei Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YARN-4192.
---
Resolution: Won't Fix

I am closing this now, like [~aw] mentioned, this should be already handled by 
metrics2. Please reopen it if you think otherwise [~nijel]. Thanks.

> Add YARN metric logging periodically to a seperate file
> ---
>
> Key: YARN-4192
> URL: https://issues.apache.org/jira/browse/YARN-4192
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
>
> HDFS-8880 added a framework for logging metrics in a given interval.
> This can be added to YARN as well
> Any thoughts ? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8511) When AM releases a container, RM removes allocation tags before it is released by NM

2018-07-11 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541083#comment-16541083
 ] 

Weiwei Yang commented on YARN-8511:
---

[~leftnoteasy], [~asuresh], [~kkaranasos], could you please help to review 
this. Thanks.

> When AM releases a container, RM removes allocation tags before it is 
> released by NM
> 
>
> Key: YARN-8511
> URL: https://issues.apache.org/jira/browse/YARN-8511
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8511.001.patch, YARN-8511.002.patch
>
>
> User leverages PC with allocation tags to avoid port conflicts between apps, 
> we found sometimes they still get port conflicts. This is a similar issue 
> like YARN-4148. Because RM immediately removes allocation tags once 
> AM#allocate asks to release a container, however container on NM has some 
> delay until it actually gets killed and released the port. We should let RM 
> remove allocation tags AFTER NM confirms the containers are released. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8505) AMLimit and userAMLimit check should be skipped for unmanaged AM

2018-07-11 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541078#comment-16541078
 ] 

Tao Yang commented on YARN-8505:


[~bibinchundatt], Thanks for your replay.
Yes, this issue will change the old behavior. We just encounter a scene that 
only specified partitions(a/b) has NMs (None in default partition) in a 
cluster, our users want to submit an unmanaged AM which will request resources 
from partition a or b but it won't run because of the resource limitation in 
default partition. Users think that unmanaged AM's resource isn't got from YARN 
so that it should not be limited because of resource and feel puzzled at this 
limitation. Is current limitation reasonable?  I think we need a discuss for 
that. cc: [~leftnoteasy], [~cheersyang], [~sunilg]

> AMLimit and userAMLimit check should be skipped for unmanaged AM
> 
>
> Key: YARN-8505
> URL: https://issues.apache.org/jira/browse/YARN-8505
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 2.9.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8505.001.patch
>
>
> AMLimit and userAMLimit check in LeafQueue#activateApplications should be 
> skipped for unmanaged AM whose resource is not taken from YARN cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8480) Add boolean option for resources

2018-07-11 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541039#comment-16541039
 ] 

Weiwei Yang commented on YARN-8480:
---

Hi [~templedf]/[~snemeth]

Could you please elaborate the use case where you use this boolean type 
resource?
 What is the difference with a countable resource with value only 0 or 1?

Thanks

> Add boolean option for resources
> 
>
> Key: YARN-8480
> URL: https://issues.apache.org/jira/browse/YARN-8480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Daniel Templeton
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8480.001.patch, YARN-8480.002.patch
>
>
> Make it possible to define a resource with a boolean value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7639) Queue Management scheduling edit policy class needs to be configured dynamically

2018-07-11 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540973#comment-16540973
 ] 

genericqa commented on YARN-7639:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 35s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}125m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-7639 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931237/YARN-7639.2.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a450989a6731 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 632aca5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21218/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21218/testReport/ |
| Max. process+thread count | 884 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U:

[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader

2018-07-11 Thread Vrushali C (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540967#comment-16540967
 ] 

Vrushali C commented on YARN-8270:
--

A couple of more questions:
- perhaps a bit dumb, but what is the 10 in all the initializers for 
registry.newQuantiles?  
- Does the asyncPutEntitiesTotalCount.incr(incCount);  need to be synchronized? 
Similarly forasyncPutEntitiesLatency.add(durationMs)?  Same question for 
the sync entities calls. 

> Adding JMX Metrics for Timeline Collector and Reader
> 
>
> Key: YARN-8270
> URL: https://issues.apache.org/jira/browse/YARN-8270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineserver
>Reporter: Sushil Ks
>Assignee: Sushil Ks
>Priority: Major
> Attachments: YARN-8270.001.patch
>
>
> This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and 
> Timeline Reader, basically for Timeline Collector it tries to capture 
> success, failure and latencies for *putEntities* and *putEntitiesAsync*  from 
> *TimelineCollectorWebService* and all the API's success, failure and 
> latencies for fetching TimelineEntities from *TimelineReaderWebServices*. 
> This would actually help in monitoring and measuring performance for ATSv2 at 
> scale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8270) Adding JMX Metrics for Timeline Collector and Reader

2018-07-11 Thread Vrushali C (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540965#comment-16540965
 ] 

Vrushali C commented on YARN-8270:
--

Hi [~haibochen] 
Gentle ping for any further thoughts on Sushil's patch.

[~Sushil-K-S] I had a couple of things:
- I see Time.monotonicNow() in some places and System.nanoTime() in others. For 
example, TimelineReaderWebServices has start time as one and endtime as other. 
Not sure if that was intentional. Perhaps we can use only one in all places, I 
think Time.monotonicNow may be better
- In TimelineCollectorWebService  # putEntities, I believe we are looking to 
measure the entire function call time? Or are we looking to measure just the 
putEntitiesAsync call at line 177 at 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorWebService.java#L177
 or the putEntities call at 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorWebService.java#L180
 ?

- Also, calculating the end time in getElapsedTimeMs may not be the accurate 
measurement of elapsed time. The end time should be obtained in the finally 
block and passed in as an argument to getElapsedTimeMs. 




> Adding JMX Metrics for Timeline Collector and Reader
> 
>
> Key: YARN-8270
> URL: https://issues.apache.org/jira/browse/YARN-8270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineserver
>Reporter: Sushil Ks
>Assignee: Sushil Ks
>Priority: Major
> Attachments: YARN-8270.001.patch
>
>
> This Jira is for emitting JMX Metrics for ATS v2 Timeline Collector and 
> Timeline Reader, basically for Timeline Collector it tries to capture 
> success, failure and latencies for *putEntities* and *putEntitiesAsync*  from 
> *TimelineCollectorWebService* and all the API's success, failure and 
> latencies for fetching TimelineEntities from *TimelineReaderWebServices*. 
> This would actually help in monitoring and measuring performance for ATSv2 at 
> scale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2018-07-11 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540963#comment-16540963
 ] 

genericqa commented on YARN-7129:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
46s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications {color} 
|
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
24s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  6m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} shellcheck {color} | {color:red}  0m  
0s{color} | {color:red} The patch generated 8 new + 0 unchanged - 0 fixed = 8 
total (was 0) {color} |
| {color:orange}-0{color} | {color:orange} shelldocs {color} | {color:orange}  
0m 34s{color} | {color:orange} The patch generated 158 new + 400 unchanged - 0 
fixed = 558 total (was 400) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
14s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 52s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-docker
 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
27s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 29m 13s{color} 
| {color:red}

[jira] [Created] (YARN-8520) Document best practice for user management

2018-07-11 Thread Eric Yang (JIRA)

Eric Yang created YARN-8520:
---

 Summary: Document best practice for user management
 Key: YARN-8520
 URL: https://issues.apache.org/jira/browse/YARN-8520
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation, yarn
Reporter: Eric Yang
Assignee: Eric Yang


Docker container must have consistent username and groups with host operating 
system when external mount points are exposed to docker container.  This 
prevents malicious or unauthorized impersonation to occur.  This task is to 
document the best practice to ensure user and group membership are consistent 
across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-11 Thread Chen Qingcha (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540899#comment-16540899
 ] 

Chen Qingcha commented on YARN-7481:


There are some projects depend on the GPU on 2.7.2 and going to move to 2.9.0,  
So I need do some bug fixing and the movement.  

For long term, When these projects plan to move to 3.1+,  we do need merge this 
to the community.   

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8480) Add boolean option for resources

2018-07-11 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540897#comment-16540897
 ] 

Wangda Tan commented on YARN-8480:
--

btw, [~templedf], I knew you found some troubles of support node partition 
concept in FS (YARN-2497). The node attribute should be a much easier support 
because there's no resource sharing required under node attribute, everything 
comes with FCFS. Basically you should only add a if check before deciding 
allocate container X on node Y. 

> Add boolean option for resources
> 
>
> Key: YARN-8480
> URL: https://issues.apache.org/jira/browse/YARN-8480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Daniel Templeton
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8480.001.patch, YARN-8480.002.patch
>
>
> Make it possible to define a resource with a boolean value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8480) Add boolean option for resources

2018-07-11 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540895#comment-16540895
 ] 

Wangda Tan commented on YARN-8480:
--

[~templedf], 

If this only changes Fair Scheduler, I'm fine with that. However this touches 
Resource/ResourceInformation/RMAminCLI/ResourceCalculator/CapacityScheduler-implementation,
 etc. 

If you could take a closer look at YARN-3409 APIs, it should not be hard at 
all. Should definitely cheaper than adding a new resource type.

> Add boolean option for resources
> 
>
> Key: YARN-8480
> URL: https://issues.apache.org/jira/browse/YARN-8480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Daniel Templeton
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8480.001.patch, YARN-8480.002.patch
>
>
> Make it possible to define a resource with a boolean value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8480) Add boolean option for resources

2018-07-11 Thread Daniel Templeton (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540885#comment-16540885
 ] 

Daniel Templeton commented on YARN-8480:


I don't disagree that in terms of general use cases there's overlap between 
boolean resources and node attributes.  From the perspective of fair scheduler, 
however, there's a big difference.  Boolean resources are a simple mechanism 
for a pure label resource that is a minor extension of existing capabilities.  
Node labels or node attributes will be a major integration effort.

While much of this work touches the common code, what we're proposing here is 
not intended as a general solution.  Capacity scheduler is where the work for 
node attributes is happening, and we don't propose to change that or muddy 
those waters.  This JIRA is intended only to provide a pure label resource for 
fair scheduler that isn't as heavy.

> Add boolean option for resources
> 
>
> Key: YARN-8480
> URL: https://issues.apache.org/jira/browse/YARN-8480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Daniel Templeton
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8480.001.patch, YARN-8480.002.patch
>
>
> Make it possible to define a resource with a boolean value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration

2018-07-11 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540879#comment-16540879
 ] 

Wangda Tan commented on YARN-7974:
--

[~oliverhuh...@gmail.com], [~jhung],  

Thanks for updating the patch, in general the patch looks good. 

Several minor comments: 

1) 

{code}
public abstract void updateTrackingUrl(String trackingUrl); 
{code}

Should have a default (maybe empty) implementation. Given AMRMClient/Async are 
all public/stable APIs, we don't want build break if any app extend from these 
classes. 

2) The updateTrackingUrl should be marked as public/unstable. 

3) Should we explicitly compare content of the new tracking url with old 
tracking url? Now we only check != null which may not be enough. 

{code} 
1824  // Update tracking url if changed and save it to state store
1825  String newTrackingUrl = statusUpdateEvent.getTrackingUrl();
1826  if (newTrackingUrl != null) {
{code}


> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-7974.001.patch, YARN-7974.002.patch, 
> YARN-7974.003.patch, YARN-7974.004.patch, YARN-7974.005.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Approach is for AM to update tracking url on heartbeat to RM, and add related 
> API in AMRMClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7639) Queue Management scheduling edit policy class needs to be configured dynamically

2018-07-11 Thread Suma Shivaprasad (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-7639:
---
Attachment: YARN-7639.2.patch

> Queue Management scheduling edit policy class needs to be configured 
> dynamically
> 
>
> Key: YARN-7639
> URL: https://issues.apache.org/jira/browse/YARN-7639
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7639.1.patch, YARN-7639.2.patch
>
>
> This needs to be configured dynamically i.e added to the list of current 
> policies configured under
> yarn.resourcemanager.scheduler.monitor.policies
> whenever auto leaf queue creation is enabled for a parent queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7639) Queue Management scheduling edit policy class needs to be configured dynamically

2018-07-11 Thread Suma Shivaprasad (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540867#comment-16540867
 ] 

Suma Shivaprasad commented on YARN-7639:


Fixed UT failure

> Queue Management scheduling edit policy class needs to be configured 
> dynamically
> 
>
> Key: YARN-7639
> URL: https://issues.apache.org/jira/browse/YARN-7639
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7639.1.patch, YARN-7639.2.patch
>
>
> This needs to be configured dynamically i.e added to the list of current 
> policies configured under
> yarn.resourcemanager.scheduler.monitor.policies
> whenever auto leaf queue creation is enabled for a parent queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-11 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7481:
-
Fix Version/s: (was: 2.7.2)

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8480) Add boolean option for resources

2018-07-11 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540810#comment-16540810
 ] 

Wangda Tan commented on YARN-8480:
--

[~snemeth]/[~templedf]. 

To me we should move this to node attribute: we already have most things ready 
here under YARN-3409 branch. I don't think we should duplicate the same things 
in this JIRA. 

For stuffs added to resource class, it's not necessarily to be countable, 
(definition of countable according to design doc of YARN-3926):
{code} 
When we speak of countable resources, we refer to resourcetypes where the 
allocation and release of resources is a simple subtraction and addition 
operation.
{code}

I'm supportive to add new resource types like set, range, or hierarchical (like 
disk resource isolation which is mentioned by [~cheersyang]). The pure label 
resource type should go to node attribute / node partition.

> Add boolean option for resources
> 
>
> Key: YARN-8480
> URL: https://issues.apache.org/jira/browse/YARN-8480
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Daniel Templeton
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8480.001.patch, YARN-8480.002.patch
>
>
> Make it possible to define a resource with a boolean value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7129) Application Catalog for YARN applications

2018-07-11 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7129:

Attachment: YARN-7129.005.patch

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-11 Thread Suma Shivaprasad (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540759#comment-16540759
 ] 

Suma Shivaprasad commented on YARN-8501:


[~snemeth] Patch LGTM,. Can you add some UTs for testing various query filter 
params to getApps?

> Reduce complexity of RMWebServices' getApps method
> --
>
> Key: YARN-8501
> URL: https://issues.apache.org/jira/browse/YARN-8501
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8501.001.patch, YARN-8501.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3611) Support Docker Containers In LinuxContainerExecutor

2018-07-11 Thread Shane Kumpf (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540749#comment-16540749
 ] 

Shane Kumpf commented on YARN-3611:
---

+1000 :) Great effort everyone. I'm excited for what has been achieved and 
where this support is going.

> Support Docker Containers In LinuxContainerExecutor
> ---
>
> Key: YARN-3611
> URL: https://issues.apache.org/jira/browse/YARN-3611
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
>
> Support Docker Containers In LinuxContainerExecutor
> LinuxContainerExecutor provides useful functionality today with respect to 
> localization, cgroups based resource management and isolation for CPU, 
> network, disk etc. as well as security with a well-defined mechanism to 
> execute privileged operations using the container-executor utility.  Bringing 
> docker support to LinuxContainerExecutor lets us use all of this 
> functionality when running docker containers under YARN, while not requiring 
> users and admins to configure and use a different ContainerExecutor. 
> There are several aspects here that need to be worked through :
> * Mechanism(s) to let clients request docker-specific functionality - we 
> could initially implement this via environment variables without impacting 
> the client API.
> * Security - both docker daemon as well as application
> * Docker image localization
> * Running a docker container via container-executor as a specified user
> * “Isolate” the docker container in terms of CPU/network/disk/etc
> * Communicating with and/or signaling the running container (ensure correct 
> pid handling)
> * Figure out workarounds for certain performance-sensitive scenarios like 
> HDFS short-circuit reads 
> * All of these need to be achieved without changing the current behavior of 
> LinuxContainerExecutor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8403) Nodemanager logs failed to download file with INFO level

2018-07-11 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8403:

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-8472

> Nodemanager logs failed to download file with INFO level
> 
>
> Key: YARN-8403
> URL: https://issues.apache.org/jira/browse/YARN-8403
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-8403.001.patch, YARN-8403.002.patch, 
> YARN-8403.003.patch, YARN-8403.png
>
>
> Some of the container execution related stack traces are printing in INFO or 
> WARN level. 
> {code}
> 2018-06-06 03:10:40,077 INFO  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:writeCredentials(1312)) - Writing 
> credentials to the nmPrivate file 
> /grid/0/hadoop/yarn/local/nmPrivate/container_e02_1528246317583_0048_01_01.tokens
> 2018-06-06 03:10:40,087 INFO  localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:run(975)) - Failed to download resource { { 
> hdfs://mycluster.example.com:8020/user/hrt_qa/Streaming/InputDir, 
> 1528254452720, FILE, null 
> },pending,[(container_e02_1528246317583_0048_01_01)],6074418082915225,DOWNLOADING}
> org.apache.hadoop.yarn.exceptions.YarnException: Download and unpack failed
> at 
> org.apache.hadoop.yarn.util.FSDownload.downloadAndUnpack(FSDownload.java:306)
> at 
> org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:283)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:409)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:66)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: 
> /grid/0/hadoop/yarn/local/filecache/28_tmp/InputDir/input1.txt (Permission 
> denied)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:236)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:219)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:318)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:307)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:338)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:401)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:464)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:443)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1169)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1149)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1038)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:408)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:399)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:381)
> at 
> org.apache.hadoop.yarn.util.FSDownload.downloadAndUnpack(FSDownload.java:298)
> ... 9 more
> {code}
> {code}
> 2018-06-06 03:10:41,547 WARN  privileged.PrivilegedOperationExecutor 
> (PrivilegedOperationExecutor.java:executePrivilegedOperation(182)) - 
> IOException executing command:
> java.io.InterruptedIOException: java.lang.InterruptedException
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1012)
> at org.apache.hadoop.util.Shell.run(Shell.java:902)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402)
> at 
>

[jira] [Resolved] (YARN-6576) Improve Diagonstic by moving Error stack trace from NM to slider AM

2018-07-11 Thread Eric Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang resolved YARN-6576.
-
Resolution: Duplicate
  Assignee: Eric Yang

This is duplicate of YARN-8403.

> Improve Diagonstic by moving Error stack trace from NM to slider AM
> ---
>
> Key: YARN-6576
> URL: https://issues.apache.org/jira/browse/YARN-6576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Yesha Vora
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
>
> Slider Master diagonstics should improve to show root cause of App failures 
> for issues like missing docker image.
> Currently, Slider Master log does not show proper error message to debug such 
> failure. User have to access Nodemanager logs to find out root cause of such 
> issues where container failed to start. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3611) Support Docker Containers In LinuxContainerExecutor

2018-07-11 Thread Eric Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540740#comment-16540740
 ] 

Eric Yang commented on YARN-3611:
-

We have completed 88 out of 105 tasks in the first milestone, and moved only 17 
tasks to phase 2 for incremental improvement.
This was a great community effort.  Thank you to everyone who contributed to 
this JIRA.  :)

> Support Docker Containers In LinuxContainerExecutor
> ---
>
> Key: YARN-3611
> URL: https://issues.apache.org/jira/browse/YARN-3611
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
>
> Support Docker Containers In LinuxContainerExecutor
> LinuxContainerExecutor provides useful functionality today with respect to 
> localization, cgroups based resource management and isolation for CPU, 
> network, disk etc. as well as security with a well-defined mechanism to 
> execute privileged operations using the container-executor utility.  Bringing 
> docker support to LinuxContainerExecutor lets us use all of this 
> functionality when running docker containers under YARN, while not requiring 
> users and admins to configure and use a different ContainerExecutor. 
> There are several aspects here that need to be worked through :
> * Mechanism(s) to let clients request docker-specific functionality - we 
> could initially implement this via environment variables without impacting 
> the client API.
> * Security - both docker daemon as well as application
> * Docker image localization
> * Running a docker container via container-executor as a specified user
> * “Isolate” the docker container in terms of CPU/network/disk/etc
> * Communicating with and/or signaling the running container (ensure correct 
> pid handling)
> * Figure out workarounds for certain performance-sensitive scenarios like 
> HDFS short-circuit reads 
> * All of these need to be achieved without changing the current behavior of 
> LinuxContainerExecutor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-3611) Support Docker Containers In LinuxContainerExecutor

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger resolved YARN-3611.
---
Resolution: Fixed

We have closed out Phase 1 of Docker container support and are now moving onto 
Phase 2. Phase 2 will be tracked by YARN-8472, so please file any new JIRAs 
under that umbrella (with the 'Docker' label). 

> Support Docker Containers In LinuxContainerExecutor
> ---
>
> Key: YARN-3611
> URL: https://issues.apache.org/jira/browse/YARN-3611
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Major
>  Labels: Docker
>
> Support Docker Containers In LinuxContainerExecutor
> LinuxContainerExecutor provides useful functionality today with respect to 
> localization, cgroups based resource management and isolation for CPU, 
> network, disk etc. as well as security with a well-defined mechanism to 
> execute privileged operations using the container-executor utility.  Bringing 
> docker support to LinuxContainerExecutor lets us use all of this 
> functionality when running docker containers under YARN, while not requiring 
> users and admins to configure and use a different ContainerExecutor. 
> There are several aspects here that need to be worked through :
> * Mechanism(s) to let clients request docker-specific functionality - we 
> could initially implement this via environment variables without impacting 
> the client API.
> * Security - both docker daemon as well as application
> * Docker image localization
> * Running a docker container via container-executor as a specified user
> * “Isolate” the docker container in terms of CPU/network/disk/etc
> * Communicating with and/or signaling the running container (ensure correct 
> pid handling)
> * Figure out workarounds for certain performance-sensitive scenarios like 
> HDFS short-circuit reads 
> * All of these need to be achieved without changing the current behavior of 
> LinuxContainerExecutor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8472) YARN Container Phase 2

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8472:
--
Description: 
In YARN-3611, we have implemented basic Docker container support for YARN.  
This story is the next phase to improve container usability.

Several area for improvements are:
 # Software defined network support
 # Interactive shell to container
 # User management sss/nscd integration
 # Runc/containerd support
 # Metrics/Logs integration with Timeline service v2 
 # Docker container profiles
 # Docker cgroup management

  was:
In YARN-3611, we have implemented basic Docker container support for YARN.  
This story is the next phase to improve container usability.

Several area for improvements are:
 # Software defined network support
 # Interactive shell to container
 # User management sss/nscd integration
 # Runc/containerd support
 # Metrics/Logs integration with Timeline service v2


> YARN Container Phase 2
> --
>
> Key: YARN-8472
> URL: https://issues.apache.org/jira/browse/YARN-8472
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Yang
>Priority: Major
>
> In YARN-3611, we have implemented basic Docker container support for YARN.  
> This story is the next phase to improve container usability.
> Several area for improvements are:
>  # Software defined network support
>  # Interactive shell to container
>  # User management sss/nscd integration
>  # Runc/containerd support
>  # Metrics/Logs integration with Timeline service v2 
>  # Docker container profiles
>  # Docker cgroup management



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8518) test-container-executor test_is_empty() is broken

2018-07-11 Thread Jim Brennan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan reassigned YARN-8518:
-

Assignee: Jim Brennan

> test-container-executor test_is_empty() is broken
> -
>
> Key: YARN-8518
> URL: https://issues.apache.org/jira/browse/YARN-8518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>
> A new test was recently added to test-container-executor.c that has some 
> problems.
> It is attempting to mkdir() a hard-coded path: 
> /tmp/2938rf2983hcqnw8ud/emptydir
> This fails because the base directory is not there.  These directories are 
> not being cleaned up either.
> It should be using TEST_ROOT.
> I don't know what Jira this change was made under - the git commit from July 
> 9 2018 does not reference a Jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-8241) MRAppMaster fails when using UID:GID pair within docker container

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger resolved YARN-8241.
---
Resolution: Won't Fix

We do not plan to support running containers in which a user lookup cannot be 
performed. This is to be considered a base requirement for a container to run. 
Some sort of user lookup mechanism such as nscd, sssd, /etc/passwd, or 
something else needs to be implemented so that the UID:GID pair can be linked 
to a user and group

> MRAppMaster fails when using UID:GID pair within docker container
> -
>
> Key: YARN-8241
> URL: https://issues.apache.org/jira/browse/YARN-8241
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>  Labels: Docker
>
> As mentioned in [this 
> comment|https://issues.apache.org/jira/browse/YARN-4266?focusedCommentId=16063931=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16063931],
>  the MRAppMaster fails for docker containers if there is no additional user 
> lookup strategy (e.g. bind-mounting /var/run/nscd or /etc/passwd). We need a 
> better solution so that users can still run even if they are not known inside 
> of the container by name



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6495) check docker container's exit code when writing to cgroup task files

2018-07-11 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540722#comment-16540722
 ] 

genericqa commented on YARN-6495:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} YARN-6495 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6495 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12918965/YARN-6495.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21216/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> check docker container's exit code when writing to cgroup task files
> 
>
> Key: YARN-6495
> URL: https://issues.apache.org/jira/browse/YARN-6495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jaeboo Jeong
>Assignee: Jaeboo Jeong
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6495.001.patch, YARN-6495.002.patch
>
>
> If I execute simple command like date on docker container, the application 
> failed to complete successfully.
> for example, 
> {code}
> $ yarn  jar 
> $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
>  -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker -shell_command "date" -jar 
> $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
>  -num_containers 1 -timeout 360
> …
> 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished 
> unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
> loop
> 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to 
> complete successfully
> {code}
> The error log is like below.
> {code}
> ...
> Failed to write pid to file 
> /cgroup_parent/cpu/hadoop-yarn/container_/tasks - No such process
> ...
> {code}
> When writing pid to cgroup tasks, container-executor doesn’t check docker 
> container’s status.
> If the container finished very quickly, we can’t write pid to cgroup tasks, 
> and it is not problem.
> So container-executor needs to check docker container’s exit code during 
> writing pid to cgroup tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7904) Privileged, trusted containers need all of their bind-mounted directories to be read-only

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7904:
--
Parent Issue: YARN-8472  (was: YARN-3611)

> Privileged, trusted containers need all of their bind-mounted directories to 
> be read-only
> -
>
> Key: YARN-7904
> URL: https://issues.apache.org/jira/browse/YARN-7904
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>  Labels: Docker
>
> Since they will be running as some other user than themselves, the NM likely 
> won't be able to clean up after them because of permissions issues. So, to 
> prevent this, we should make these directories read-only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5168) Add port mapping handling when docker container use bridge network

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5168:
--
Parent Issue: YARN-8472  (was: YARN-3611)

> Add port mapping handling when docker container use bridge network
> --
>
> Key: YARN-5168
> URL: https://issues.apache.org/jira/browse/YARN-5168
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jun Gong
>Priority: Major
>  Labels: Docker
>
> YARN-4007 addresses different network setups when launching the docker 
> container. We need support port mapping when docker container uses bridge 
> network.
> The following problems are what we faced:
> 1. Add "-P" to map docker container's exposed ports to automatically.
> 2. Add "-p" to let user specify specific ports to map.
> 3. Add service registry support for bridge network case, then app could find 
> each other. It could be done out of YARN, however it might be more convenient 
> to support it natively in YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7197) Add support for a volume blacklist for docker containers

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger resolved YARN-7197.
---
Resolution: Won't Fix

There are currently no plans to attempt to implement this feature. Feel free to 
re-open if you would like to work on this

> Add support for a volume blacklist for docker containers
> 
>
> Key: YARN-7197
> URL: https://issues.apache.org/jira/browse/YARN-7197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-7197.001.patch, YARN-7197.002.patch, 
> YARN-7197.003.patch, YARN-7197.004.patch, YARN-7197.005.patch
>
>
> Docker supports bind mounting host directories into containers. Work is 
> underway to allow admins to configure a whilelist of volume mounts. While 
> this is a much needed and useful feature, it opens the door for 
> misconfiguration that may lead to users being able to compromise or crash the 
> system. 
> One example would be allowing users to mount /run from a host running 
> systemd, and then running systemd in that container, rendering the host 
> mostly unusable.
> This issue is to add support for a default blacklist. The default blacklist 
> would be where we put files and directories that if mounted into a container, 
> are likely to have negative consequences. Users are encouraged not to remove 
> items from the default blacklist, but may do so if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7639) Queue Management scheduling edit policy class needs to be configured dynamically

2018-07-11 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540717#comment-16540717
 ] 

genericqa commented on YARN-7639:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}129m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoQueueCreation
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-7639 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931210/YARN-7639.1.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cb9c6aee71b8 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 632aca5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21215/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21215/testReport/ |
| Max. process+thread count | 942 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U:

[jira] [Updated] (YARN-5670) Add support for Docker image clean up

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-5670:
--
Parent: YARN-8472  (was: YARN-3611)

> Add support for Docker image clean up
> -
>
> Key: YARN-5670
> URL: https://issues.apache.org/jira/browse/YARN-5670
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
>
> Regarding to Docker image localization, we also need a way to clean up the 
> old/stale Docker image to save storage space. We may extend deletion service 
> to utilize "docker rm" to do this.
> This is related to YARN-3854 and may depend on its implementation. Please 
> refer to YARN-3854 for Docker image localization details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6495) check docker container's exit code when writing to cgroup task files

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6495:
--
Parent: YARN-8472  (was: YARN-3611)

> check docker container's exit code when writing to cgroup task files
> 
>
> Key: YARN-6495
> URL: https://issues.apache.org/jira/browse/YARN-6495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Jaeboo Jeong
>Assignee: Jaeboo Jeong
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6495.001.patch, YARN-6495.002.patch
>
>
> If I execute simple command like date on docker container, the application 
> failed to complete successfully.
> for example, 
> {code}
> $ yarn  jar 
> $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
>  -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker -shell_command "date" -jar 
> $HADOOP_HOME/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar
>  -num_containers 1 -timeout 360
> …
> 17/04/12 00:16:40 INFO distributedshell.Client: Application did finished 
> unsuccessfully. YarnState=FINISHED, DSFinalStatus=FAILED. Breaking monitoring 
> loop
> 17/04/12 00:16:40 ERROR distributedshell.Client: Application failed to 
> complete successfully
> {code}
> The error log is like below.
> {code}
> ...
> Failed to write pid to file 
> /cgroup_parent/cpu/hadoop-yarn/container_/tasks - No such process
> ...
> {code}
> When writing pid to cgroup tasks, container-executor doesn’t check docker 
> container’s status.
> If the container finished very quickly, we can’t write pid to cgroup tasks, 
> and it is not problem.
> So container-executor needs to check docker container’s exit code during 
> writing pid to cgroup tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8376) Separate white list for docker.trusted.registries and docker.privileged-container.registries

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8376:
--
Parent: YARN-8472  (was: YARN-3611)

> Separate white list for docker.trusted.registries and 
> docker.privileged-container.registries
> 
>
> Key: YARN-8376
> URL: https://issues.apache.org/jira/browse/YARN-8376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Priority: Major
>  Labels: docker
>
> In the ideal world, it would be possible to have separate white lists for 
> docker registry depending on the security requirement for each type of docker 
> images:
> 1. Registries from which we can run non-privileged containers without mounts
> 2. Registries from which we can run non-privileged containers with mounts
> 3. Registries from which we can run privileged or non-privileged containers 
> with mounts
> In the current implementation, there are only type 1 and type 2 or 3.  It 
> would be nice to definite a separate white list to differentiate between 2 
> and 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-3854) Add localization support for docker images

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-3854:
--
Parent: YARN-8472  (was: YARN-3611)

> Add localization support for docker images
> --
>
> Key: YARN-3854
> URL: https://issues.apache.org/jira/browse/YARN-3854
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Attachments: YARN-3854-branch-2.8.001.patch, 
> YARN-3854_Localization_support_for_Docker_image_v1.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v2.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v3.pdf
>
>
> We need the ability to localize docker images when those images aren't 
> already available locally. There are various approaches that could be used 
> here with different trade-offs/issues : image archives on HDFS + docker load 
> ,  docker pull during the localization phase or (automatic) docker pull 
> during the run/launch phase. 
> We also need the ability to clean-up old/stale, unused images. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6456) Allow administrators to set a single ContainerRuntime for all containers

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6456:
--
Parent: YARN-8472  (was: YARN-3611)

> Allow administrators to set a single ContainerRuntime for all containers
> 
>
> Key: YARN-6456
> URL: https://issues.apache.org/jira/browse/YARN-6456
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Priority: Major
>  Labels: Docker
>
>  
> With LCE, there are multiple ContainerRuntimes available for handling 
> different types of containers; default, docker, java sandbox. Admins should 
> have the ability to override the user decision and set a single global 
> ContainerRuntime to be used for all containers.
> Original Description:
> {quote}One reason to use Docker containers is to be able to isolate different 
> workloads, even, if they run as the same user.
> I have noticed some issues in the current design:
>  1. DockerLinuxContainerRuntime mounts containerLocalDirs 
> {{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
> userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see 
> and modify the files of another container. I think the application file cache 
> directory should be enough for the container to run in most of the cases.
>  2. The whole cgroups directory is mounted. Would the container directory be 
> enough?
>  3. There is no way to enforce exclusive use of Docker for all containers. 
> There should be an option that it is not the user but the admin that requires 
> to use Docker.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7848) Force removal of docker containers that do not get removed on first try

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-7848:
--
Parent: YARN-8472  (was: YARN-3611)

> Force removal of docker containers that do not get removed on first try
> ---
>
> Key: YARN-7848
> URL: https://issues.apache.org/jira/browse/YARN-7848
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Priority: Major
>  Labels: Docker
>
> After the addition of YARN-5366, containers will get removed after a certain 
> debug delay. However, this is a one-time effort. If the removal fails for 
> whatever reason, the container will persist. We need to add a mechanism for a 
> forced removal of those containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8263) DockerClient still touches hadoop.tmp.dir

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8263:
--
Parent: YARN-8472  (was: YARN-3611)

> DockerClient still touches hadoop.tmp.dir
> -
>
> Key: YARN-8263
> URL: https://issues.apache.org/jira/browse/YARN-8263
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Jason Lowe
>Priority: Minor
>  Labels: Docker
>
> The DockerClient constructor fails if hadoop.tmp.dir is not set and proceeds 
> to create a directory there.  After YARN-8064 there's no longer a need to 
> touch the temporary directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-11 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540716#comment-16540716
 ] 

genericqa commented on YARN-8501:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
52s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
6s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 19s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8501 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931209/YARN-8501.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2f92ee28d778 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 632aca5 |
| maven

[jira] [Updated] (YARN-8287) Update documentation and yarn-default related to the Docker runtime

2018-07-11 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8287:
--
Parent: YARN-8472  (was: YARN-3611)

> Update documentation and yarn-default related to the Docker runtime
> ---
>
> Key: YARN-8287
> URL: https://issues.apache.org/jira/browse/YARN-8287
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Priority: Minor
>  Labels: Docker
>
> There are a few typos and omissions in the documentation and yarn-default wrt 
> running Docker containers on YARN. Below is what I noticed, but a more 
> thorough review is still needed:
>  * docker.allowed.volume-drivers is not documented
>  * None of the GPU or FPGA related items are in the Docker docs.
>  * "To run without any capabilites," - typo in yarn-default.xml
>  * remove    from yarn-default.xml
>  * yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed missing from 
> docs
>  * yarn.nodemanager.runtime.linux.docker.stop.grace-period missing from docs
>  * The user remapping features are missing from the docs, we should 
> explicitly call this out.
>  * The privileged container section could use a bit of rework to outline the 
> risks of the feature.
>  * Is it time to remove the security warnings? The community has made many 
> improvements since that warning was added. 
>  * "path within the contatiner" typo



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8518) test-container-executor test_is_empty() is broken

2018-07-11 Thread Robert Kanter (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540604#comment-16540604
 ] 

Robert Kanter commented on YARN-8518:
-

That would be great [~Jim_Brennan], thanks!

> test-container-executor test_is_empty() is broken
> -
>
> Key: YARN-8518
> URL: https://issues.apache.org/jira/browse/YARN-8518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Priority: Major
>
> A new test was recently added to test-container-executor.c that has some 
> problems.
> It is attempting to mkdir() a hard-coded path: 
> /tmp/2938rf2983hcqnw8ud/emptydir
> This fails because the base directory is not there.  These directories are 
> not being cleaned up either.
> It should be using TEST_ROOT.
> I don't know what Jira this change was made under - the git commit from July 
> 9 2018 does not reference a Jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8519) Yarn UI2 : Changes to depict Auto Created leaf Queues/Managed Queues differently from other queues

2018-07-11 Thread Suma Shivaprasad (JIRA)

Suma Shivaprasad created YARN-8519:
--

 Summary: Yarn UI2 : Changes to depict Auto Created leaf 
Queues/Managed Queues differently from other queues
 Key: YARN-8519
 URL: https://issues.apache.org/jira/browse/YARN-8519
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Suma Shivaprasad


YARN-7420 covers changes to depict auto created leaf queues in a separate color 
notation but this was done in the old Yarn UI and similiar chnages need to be 
incorporated in the new YARN UI to depict Managed Parent queues/Auto-Created 
leaf queues separately



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8518) test-container-executor test_is_empty() is broken

2018-07-11 Thread Jim Brennan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540597#comment-16540597
 ] 

Jim Brennan commented on YARN-8518:
---

[~rkanter], [~szegedim], let me know if you would like me to put up a patch for 
this.

 

> test-container-executor test_is_empty() is broken
> -
>
> Key: YARN-8518
> URL: https://issues.apache.org/jira/browse/YARN-8518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Priority: Major
>
> A new test was recently added to test-container-executor.c that has some 
> problems.
> It is attempting to mkdir() a hard-coded path: 
> /tmp/2938rf2983hcqnw8ud/emptydir
> This fails because the base directory is not there.  These directories are 
> not being cleaned up either.
> It should be using TEST_ROOT.
> I don't know what Jira this change was made under - the git commit from July 
> 9 2018 does not reference a Jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8518) test-container-executor test_is_empty() is broken

2018-07-11 Thread Jim Brennan (JIRA)

Jim Brennan created YARN-8518:
-

 Summary: test-container-executor test_is_empty() is broken
 Key: YARN-8518
 URL: https://issues.apache.org/jira/browse/YARN-8518
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jim Brennan


A new test was recently added to test-container-executor.c that has some 
problems.

It is attempting to mkdir() a hard-coded path: /tmp/2938rf2983hcqnw8ud/emptydir

This fails because the base directory is not there.  These directories are not 
being cleaned up either.

It should be using TEST_ROOT.

I don't know what Jira this change was made under - the git commit from July 9 
2018 does not reference a Jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7639) Queue Management scheduling edit policy class needs to be configured dynamically

2018-07-11 Thread Suma Shivaprasad (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-7639:
---
Attachment: YARN-7639.1.patch

> Queue Management scheduling edit policy class needs to be configured 
> dynamically
> 
>
> Key: YARN-7639
> URL: https://issues.apache.org/jira/browse/YARN-7639
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-7639.1.patch
>
>
> This needs to be configured dynamically i.e added to the list of current 
> policies configured under
> yarn.resourcemanager.scheduler.monitor.policies
> whenever auto leaf queue creation is enabled for a parent queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-11 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8501:
-
Attachment: YARN-8501.002.patch

> Reduce complexity of RMWebServices' getApps method
> --
>
> Key: YARN-8501
> URL: https://issues.apache.org/jira/browse/YARN-8501
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8501.001.patch, YARN-8501.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-11 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach reassigned YARN-8517:


Assignee: Antal Bálint Steinbach

> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Antal Bálint Steinbach
>Priority: Major
>  Labels: newbie, newbie++
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8434) Update federation documentation of Nodemanager configurations

2018-07-11 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540506#comment-16540506
 ] 

Íñigo Goiri commented on YARN-8434:
---

The parts that were hard to figure out were:
* How to setup the scheduler. We needed to setup the regular scheduler address 
for the RMs but then everybody else had to point to the AMRMProxy. Not sure if 
there's a point clarifying this.
* Having to setup HADOOP_CLIENT_CONF. Right now, this doesn't work without it. 
Something needs to be said about this. Is there a JIRA?

> Update federation documentation of Nodemanager configurations
> -
>
> Key: YARN-8434
> URL: https://issues.apache.org/jira/browse/YARN-8434
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-8434.001.patch, YARN-8434.002.patch, 
> YARN-8434.003.patch
>
>
> FederationRMFailoverProxyProvider doesn't handle connecting to active RM. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-11 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540501#comment-16540501
 ] 

Wangda Tan commented on YARN-7481:
--

[~qinc...@microsoft.com], I saw you were keep updating patches in the last 
several months. Given the proposed approach conflicts with community existing 
solution, is there any plans to merge this with community solutions?

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-11 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8517:
-
Description: 
Looking at the documentation here: 
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
I cannot find documentation for 2 RM REST endpoints: 
- /apps/\{appid\}/appattempts/\{appattemptid\}/containers
- /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
I suppose they are not intentionally undocumented.

  was:
Looking at the documentation here: 
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
I cannot find documentation for 2 RM REST endpoints: 
- /apps/\{appid\}/appattempts/{appattemptid}/containers
- /apps/{appid}/appattempts/{appattemptid}/containers/{containerid}
I suppose they are not intentionally undocumented.


> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Priority: Major
>  Labels: newbie, newbie++
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-11 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8517:
-
Description: 
Looking at the documentation here: 
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
I cannot find documentation for 2 RM REST endpoints: 
- /apps/\{appid\}/appattempts/\{appattemptid\}/containers
- /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}

I suppose they are not intentionally undocumented.

  was:
Looking at the documentation here: 
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
I cannot find documentation for 2 RM REST endpoints: 
- /apps/\{appid\}/appattempts/\{appattemptid\}/containers
- /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
I suppose they are not intentionally undocumented.


> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Priority: Major
>  Labels: newbie, newbie++
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers
> - /apps/\{appid\}/appattempts/\{appattemptid\}/containers/\{containerid\}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-11 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8517:
-
Description: 
Looking at the documentation here: 
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
I cannot find documentation for 2 RM REST endpoints: 
- /apps/\{appid\}/appattempts/{appattemptid}/containers
- /apps/{appid}/appattempts/{appattemptid}/containers/{containerid}
I suppose they are not intentionally undocumented.

  was:
Looking at the documentation here: 
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
I cannot find documentation for 2 RM REST endpoints: 
- /apps/{appid}/appattempts/{appattemptid}/containers
- /apps/{appid}/appattempts/{appattemptid}/containers/{containerid}
I suppose they are not intentionally undocumented.


> getContainer and getContainers ResourceManager REST API methods are not 
> documented
> --
>
> Key: YARN-8517
> URL: https://issues.apache.org/jira/browse/YARN-8517
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Priority: Major
>  Labels: newbie, newbie++
>
> Looking at the documentation here: 
> https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
> I cannot find documentation for 2 RM REST endpoints: 
> - /apps/\{appid\}/appattempts/{appattemptid}/containers
> - /apps/{appid}/appattempts/{appattemptid}/containers/{containerid}
> I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-11 Thread Szilard Nemeth (JIRA)

Szilard Nemeth created YARN-8517:


 Summary: getContainer and getContainers ResourceManager REST API 
methods are not documented
 Key: YARN-8517
 URL: https://issues.apache.org/jira/browse/YARN-8517
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Szilard Nemeth


Looking at the documentation here: 
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
I cannot find documentation for 2 RM REST endpoints: 
- /apps/{appid}/appattempts/{appattemptid}/containers
- /apps/{appid}/appattempts/{appattemptid}/containers/{containerid}
I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-11 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540485#comment-16540485
 ] 

genericqa commented on YARN-8501:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 44s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
13s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 55s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}143m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsCustomResourceTypes
 |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8501 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931182/YARN-8501.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc

[jira] [Commented] (YARN-8434) Update federation documentation of Nodemanager configurations

2018-07-11 Thread Subru Krishnan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540428#comment-16540428
 ] 

Subru Krishnan commented on YARN-8434:
--

Thanks [~bibinchundatt] for understanding/verifying! +1 from my side on latest 
patch (v3).

[~elgoiri], do you have any other documentation fixes before this goes in?

> Update federation documentation of Nodemanager configurations
> -
>
> Key: YARN-8434
> URL: https://issues.apache.org/jira/browse/YARN-8434
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-8434.001.patch, YARN-8434.002.patch, 
> YARN-8434.003.patch
>
>
> FederationRMFailoverProxyProvider doesn't handle connecting to active RM. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8504) Incorrect sTarget column causing DataTable warning on RM application and scheduler web page and application history server webpage

2018-07-11 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540342#comment-16540342
 ] 

Íñigo Goiri commented on YARN-8504:
---

Yes, same thing for NodesPage very easy to break.
Ideally, we should even check that the type is the correct; for example if the 
table defines column X as sortable, we should check that it does so.
However, the latter is a little too complex.
At least a basic check that the numbers match should be there.

> Incorrect sTarget column causing DataTable warning on RM application and 
> scheduler web page and application history server webpage
> --
>
> Key: YARN-8504
> URL: https://issues.apache.org/jira/browse/YARN-8504
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Attachments: YARN-8504.001.patch, image-2018-07-09-12-09-21-401.png, 
> image-2018-07-09-12-12-18-131.png
>
>
>  On a cluster built from latest trunk, click 'State' at application history 
> webpage  gives following warning
>  *DataTables warning (table id = 'apps'): Requested unknown parameter '9' 
> from the data source for row 0*
> and click 'Cluster' at RM applications page gives following warning
>   !image-2018-07-09-12-09-21-401.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8501) Reduce complexity of RMWebServices' getApps method

2018-07-11 Thread Szilard Nemeth (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8501:
-
Attachment: YARN-8501.001.patch

> Reduce complexity of RMWebServices' getApps method
> --
>
> Key: YARN-8501
> URL: https://issues.apache.org/jira/browse/YARN-8501
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: restapi
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8501.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8434) Update federation documentation of Nodemanager configurations

2018-07-11 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540236#comment-16540236
 ] 

genericqa commented on YARN-8434:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
35m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 32s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 48m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8434 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931174/YARN-8434.003.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 85e173e58a38 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2ae13d4 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 397 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21212/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Update federation documentation of Nodemanager configurations
> -
>
> Key: YARN-8434
> URL: https://issues.apache.org/jira/browse/YARN-8434
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-8434.001.patch, YARN-8434.002.patch, 
> YARN-8434.003.patch
>
>
> FederationRMFailoverProxyProvider doesn't handle connecting to active RM. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8434) Update federation documentation of Nodemanager configurations

2018-07-11 Thread Bibin A Chundatt (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540162#comment-16540162
 ] 

Bibin A Chundatt commented on YARN-8434:


Updated jira description and patch added removing 
yarn.client.failover-proxy-provider  configuration from NMs section

> Update federation documentation of Nodemanager configurations
> -
>
> Key: YARN-8434
> URL: https://issues.apache.org/jira/browse/YARN-8434
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-8434.001.patch, YARN-8434.002.patch, 
> YARN-8434.003.patch
>
>
> FederationRMFailoverProxyProvider doesn't handle connecting to active RM. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8434) Update federation documentation of Nodemanager configurations

2018-07-11 Thread Bibin A Chundatt (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8434:
---
Attachment: YARN-8434.003.patch

> Update federation documentation of Nodemanager configurations
> -
>
> Key: YARN-8434
> URL: https://issues.apache.org/jira/browse/YARN-8434
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-8434.001.patch, YARN-8434.002.patch, 
> YARN-8434.003.patch
>
>
> FederationRMFailoverProxyProvider doesn't handle connecting to active RM. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8434) Update federation documentation of Nodemanager configurations

2018-07-11 Thread Bibin A Chundatt (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8434:
---
Summary: Update federation documentation of Nodemanager configurations  
(was: Nodemanager not registering to active RM in federation)

> Update federation documentation of Nodemanager configurations
> -
>
> Key: YARN-8434
> URL: https://issues.apache.org/jira/browse/YARN-8434
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8434.001.patch, YARN-8434.002.patch
>
>
> FederationRMFailoverProxyProvider doesn't handle connecting to active RM. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8434) Update federation documentation of Nodemanager configurations

2018-07-11 Thread Bibin A Chundatt (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8434:
---
Priority: Minor  (was: Blocker)

> Update federation documentation of Nodemanager configurations
> -
>
> Key: YARN-8434
> URL: https://issues.apache.org/jira/browse/YARN-8434
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-8434.001.patch, YARN-8434.002.patch
>
>
> FederationRMFailoverProxyProvider doesn't handle connecting to active RM. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8491) TestServiceCLI#testEnableFastLaunch fail when umask is 077

2018-07-11 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539925#comment-16539925
 ] 

Hudson commented on YARN-8491:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14556 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14556/])
YARN-8491. TestServiceCLI#testEnableFastLaunch fail when umask is 077. 
(bibinchundatt: rev 52e1bc8539ce769f47743d8b2d318a54c3887ba0)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/client/TestServiceCLI.java


> TestServiceCLI#testEnableFastLaunch fail when umask is 077
> --
>
> Key: YARN-8491
> URL: https://issues.apache.org/jira/browse/YARN-8491
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: K G Bakthavachalam
>Assignee: K G Bakthavachalam
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8491.001.patch
>
>
> UT failure in TestServiceCLI#testEnableFastLaunch due to permission issue 
> ,when permission is given only to the  owner and not to group and 
> others(global).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7031) Support distributed node attributes

2018-07-11 Thread Bibin A Chundatt (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt resolved YARN-7031.

   Resolution: Implemented
Fix Version/s: YARN-3409

Closing this since its already handled

> Support distributed node attributes
> ---
>
> Key: YARN-7031
> URL: https://issues.apache.org/jira/browse/YARN-7031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: YARN-3409
>
> Attachments: Distributed node attributes v1.pdf, 
> YARN-7031-YARN-3409.001.patch
>
>
> Allow nodemanagers to push its attributes to RM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8491) Fix TestServiceCLI#testEnableFastLaunch fail when umask is 077

2018-07-11 Thread Bibin A Chundatt (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8491:
---
Summary: Fix TestServiceCLI#testEnableFastLaunch fail when umask is 077  
(was: UT failure in TestServiceCLI#testEnableFastLaunch when umask is set to 
077)

> Fix TestServiceCLI#testEnableFastLaunch fail when umask is 077
> --
>
> Key: YARN-8491
> URL: https://issues.apache.org/jira/browse/YARN-8491
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: bakthavachalam
>Assignee: bakthavachalam
>Priority: Major
> Attachments: YARN-8491.001.patch
>
>
> UT failure in TestServiceCLI#testEnableFastLaunch due to permission issue 
> ,when permission is given only to the  owner and not to group and 
> others(global).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8491) TestServiceCLI#testEnableFastLaunch fail when umask is 077

2018-07-11 Thread Bibin A Chundatt (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8491:
---
Summary: TestServiceCLI#testEnableFastLaunch fail when umask is 077  (was: 
Fix TestServiceCLI#testEnableFastLaunch fail when umask is 077)

> TestServiceCLI#testEnableFastLaunch fail when umask is 077
> --
>
> Key: YARN-8491
> URL: https://issues.apache.org/jira/browse/YARN-8491
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: bakthavachalam
>Assignee: bakthavachalam
>Priority: Major
> Attachments: YARN-8491.001.patch
>
>
> UT failure in TestServiceCLI#testEnableFastLaunch due to permission issue 
> ,when permission is given only to the  owner and not to group and 
> others(global).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8504) Incorrect sTarget column causing DataTable warning on RM application and scheduler web page and application history server webpage

2018-07-11 Thread tianjuan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539783#comment-16539783
 ] 

tianjuan commented on YARN-8504:


this is caused by YARN-7088 which adds a new column. there are some else places 
where the digits are manually counted, for example NodesPage.java. Maybe 
unified UTs are needed.

> Incorrect sTarget column causing DataTable warning on RM application and 
> scheduler web page and application history server webpage
> --
>
> Key: YARN-8504
> URL: https://issues.apache.org/jira/browse/YARN-8504
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
> Attachments: YARN-8504.001.patch, image-2018-07-09-12-09-21-401.png, 
> image-2018-07-09-12-12-18-131.png
>
>
>  On a cluster built from latest trunk, click 'State' at application history 
> webpage  gives following warning
>  *DataTables warning (table id = 'apps'): Requested unknown parameter '9' 
> from the data source for row 0*
> and click 'Cluster' at RM applications page gives following warning
>   !image-2018-07-09-12-09-21-401.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8511) When AM releases a container, RM removes allocation tags before it is released by NM

2018-07-11 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539774#comment-16539774
 ] 

genericqa commented on YARN-8511:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
45s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 
41s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 22s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}199m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.sls.TestSLSRunner |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8511 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931119/YARN-8511.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7d5d1c13287c 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a47ec5d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit |

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-07-11 Thread Szilard Nemeth (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539748#comment-16539748
 ] 

Szilard Nemeth commented on YARN-8468:
--

Hey [~bsteinbach]!
Thanks for the updated patch.
LGTM, +1 (non-binding)

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
>  Labels: patch
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8421) when moving app, activeUsers is increased, even though app does not have outstanding request

2018-07-11 Thread genericqa (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539734#comment-16539734
 ] 

genericqa commented on YARN-8421:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 58s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8421 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12931121/YARN-8421.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0a5b5eed3cc4 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a47ec5d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21211/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21211/testReport/ |
| Max. process+thread count | 833 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U:

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-11 Thread Chen Qingcha (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: hadoop-2.7.2.gpu-port-20180711.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-11 Thread Chen Qingcha (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: (was: hadoop-2.7.2.gpu-port-20180710.patch)

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port.patch, hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-11 Thread Chen Qingcha (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: (was: hadoop-2.7.2.gpu-port-20180711.patch)

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port.patch, hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-11 Thread Chen Qingcha (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: (was: hadoop-2.7.2.gpu-port-20180710_old.patch)

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port.patch, hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8512) ATSv2 entities are not published to HBase from second attempt onwards

2018-07-11 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539649#comment-16539649
 ] 

Hudson commented on YARN-8512:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14555 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14555/])
YARN-8512. ATSv2 entities are not published to HBase from second attempt 
(sunilg: rev 7f1d3d0e9dbe328fae0d43421665e0b6907b33fe)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/BaseContainerManagerTest.java


> ATSv2 entities are not published to HBase from second attempt onwards
> -
>
> Key: YARN-8512
> URL: https://issues.apache.org/jira/browse/YARN-8512
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0, 2.10.0, 3.2.0, 3.0.3
>Reporter: Yesha Vora
>Assignee: Rohith Sharma K S
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8512.01.patch, YARN-8512.02.patch, 
> YARN-8512.03.patch
>
>
> It is observed that if 1st attempt master container is died and 2nd attempt 
> master container is launched in a NM where old containers are running but not 
> master container. 
> ||Attempt||NM1||NM2||Action||
> |attempt-1|master container i.e container-1-1|container-1-2|master container 
> died|
> |attempt-2|NA|container-1-2 and master container container-2-1|NA|
> In the above scenario, NM doesn't identifies flowContext and will get log 
> below
> {noformat}
> 2018-07-10 00:44:38,285 WARN  storage.HBaseTimelineWriterImpl 
> (HBaseTimelineWriterImpl.java:write(170)) - Found null for one of: 
> flowName=null appId=application_1531175172425_0001 userId=hbase 
> clusterId=yarn-cluster . Not proceeding with writing to hbase
> 2018-07-10 00:44:38,560 WARN  storage.HBaseTimelineWriterImpl 
> (HBaseTimelineWriterImpl.java:write(170)) - Found null for one of: 
> flowName=null appId=application_1531175172425_0001 userId=hbase 
> clusterId=yarn-cluster . Not proceeding with writing to hbase
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-11 Thread Chen Qingcha (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: hadoop-2.7.2.gpu-port-20180711.patch

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180710.patch, 
> hadoop-2.7.2.gpu-port-20180710_old.patch, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7481) Gpu locality support for Better AI scheduling

2018-07-11 Thread Chen Qingcha (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qingcha updated YARN-7481:
---
Attachment: (was: hadoop-2.7.2.gpu-port-20180711.patch)

> Gpu locality support for Better AI scheduling
> -
>
> Key: YARN-7481
> URL: https://issues.apache.org/jira/browse/YARN-7481
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, RM, yarn
>Affects Versions: 2.7.2
>Reporter: Chen Qingcha
>Priority: Major
> Fix For: 2.7.2
>
> Attachments: GPU locality support for Job scheduling.pdf, 
> hadoop-2.7.2.gpu-port-20180710.patch, 
> hadoop-2.7.2.gpu-port-20180710_old.patch, 
> hadoop-2.7.2.gpu-port-20180711.patch, hadoop-2.7.2.gpu-port.patch, 
> hadoop-2.9.0.gpu-port.patch, hadoop_2.9.0.patch
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> We enhance Hadoop with GPU support for better AI job scheduling. 
> Currently, YARN-3926 also supports GPU scheduling, which treats GPU as 
> countable resource. 
> However, GPU placement is also very important to deep learning job for better 
> efficiency.
>  For example, a 2-GPU job runs on gpu {0,1} could be faster than run on gpu 
> {0, 7}, if GPU 0 and 1 are under the same PCI-E switch while 0 and 7 are not.
>  We add the support to Hadoop 2.7.2 to enable GPU locality scheduling, which 
> support fine-grained GPU placement. 
> A 64-bits bitmap is added to yarn Resource, which indicates both GPU usage 
> and locality information in a node (up to 64 GPUs per node). '1' means 
> available and '0' otherwise in the corresponding position of the bit.   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8421) when moving app, activeUsers is increased, even though app does not have outstanding request

2018-07-11 Thread kyungwan nam (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539625#comment-16539625
 ] 

kyungwan nam commented on YARN-8421:


[~eepayne], Thank you for your review.
I've attached a new patch according to your comment.

> when moving app, activeUsers is increased, even though app does not have 
> outstanding request 
> -
>
> Key: YARN-8421
> URL: https://issues.apache.org/jira/browse/YARN-8421
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.4
>Reporter: kyungwan nam
>Priority: Major
> Attachments: YARN-8421.001.patch, YARN-8421.002.patch, 
> YARN-8421.003.patch
>
>
> all containers for app1 have been allocated.
> move app1 from default Queue to test Queue as follows.
> {code}
>   yarn rmadmin application -movetoqueue app1 -queue test
> {code}
> _activeUsers_ of the test Queue is increased even though app1 which does not 
> have outstanding request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8421) when moving app, activeUsers is increased, even though app does not have outstanding request

2018-07-11 Thread kyungwan nam (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-8421:
---
Attachment: YARN-8421.003.patch

> when moving app, activeUsers is increased, even though app does not have 
> outstanding request 
> -
>
> Key: YARN-8421
> URL: https://issues.apache.org/jira/browse/YARN-8421
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.4
>Reporter: kyungwan nam
>Priority: Major
> Attachments: YARN-8421.001.patch, YARN-8421.002.patch, 
> YARN-8421.003.patch
>
>
> all containers for app1 have been allocated.
> move app1 from default Queue to test Queue as follows.
> {code}
>   yarn rmadmin application -movetoqueue app1 -queue test
> {code}
> _activeUsers_ of the test Queue is increased even though app1 which does not 
> have outstanding request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

88 matches

Mail list logo