[jira] [Updated] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-10 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8757:
-
Target Version/s: 3.2.0
Priority: Critical  (was: Major)

> [Submarine] Add Tensorboard component when --tensorboard is specified
> -
>
> Key: YARN-8757
> URL: https://issues.apache.org/jira/browse/YARN-8757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8757.001.patch
>
>
> We need to have a Tensorboard component when --tensorboard is specified. And 
> we need to set quicklinks to let users view tensorboard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8709) CS preemption monitor always fails since one under-served queue was deleted

2018-09-10 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609997#comment-16609997
 ] 

Tao Yang commented on YARN-8709:


Thanks [~eepayne],[~cheersyang] and [~sunilg].

> CS preemption monitor always fails since one under-served queue was deleted
> ---
>
> Key: YARN-8709
> URL: https://issues.apache.org/jira/browse/YARN-8709
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, scheduler preemption
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 2.9.2, 3.0.4, 3.1.2
>
> Attachments: YARN-8709.001.patch, YARN-8709.002.patch
>
>
> After some queues deleted, the preemption checker in SchedulingMonitor was 
> always skipped  because of YarnRuntimeException for every run.
> Error logs:
> {noformat}
> ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: 
> Exception raised while executing preemption checker, skip this run..., 
> exception=
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't 
> happen, cannot find TempQueuePerPartition for queueName=1535075839208
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> I think there is something wrong with partitionToUnderServedQueues field in 
> ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues 
> can be add but never be removed, except rebuilding this policy. For example, 
> once under-served queue "a" is added into this structure, it will always be 
> there and never be removed, intra-queue preemption checker will try to get 
> all queues info for partitionToUnderServedQueues in 
> IntraQueueCandidatesSelector#selectCandidates and will throw 
> YarnRuntimeException if not found. So that after queue "a" is deleted from 
> queue structure, the preemption checker will always fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609992#comment-16609992
 ] 

Hadoop QA commented on YARN-8757:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
28s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine 
in trunk has 4 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 10s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine: 
The patch generated 19 new + 48 unchanged - 2 fixed = 67 total (was 50) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
28s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8757 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12939186/YARN-8757.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a947accecd4e 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 987d819 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/21806/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html
 |
| checkstyle | 

[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609972#comment-16609972
 ] 

Hadoop QA commented on YARN-8763:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
42s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 22s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 6 new + 2 unchanged - 0 fixed = 8 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m 
21s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 12s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
28s{color} | {color:red} The patch generated 3 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 33s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.nodemanager.TestNodeManagerReboot |
|   | 
hadoop.yarn.server.nodemanager.containermanager.resourceplugin.TestResourcePluginManager
 |
|   | hadoop.yarn.server.nodemanager.TestNodeManagerResync |
|   | hadoop.yarn.server.nodemanager.TestNodeStatusUpdater |
|   | hadoop.yarn.server.nodemanager.webapp.TestNMWebServer |
|   | hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
|   | hadoop.yarn.server.nodemanager.TestNodeManagerShutdown |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8763 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12939182/YARN-8763-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  xml  findbugs  checkstyle  |
| uname | 

[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-10 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609943#comment-16609943
 ] 

Wangda Tan commented on YARN-8757:
--

Added ver.1 patch which spin up a Tensorboard container when --{{tensorboard}}  
is specified.  And now user can launch a tensorboard container point to a 
parent folder to list all jobs. Will update documentations in next patch. Also 
improved unit tests a bit.

> [Submarine] Add Tensorboard component when --tensorboard is specified
> -
>
> Key: YARN-8757
> URL: https://issues.apache.org/jira/browse/YARN-8757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8757.001.patch
>
>
> We need to have a Tensorboard component when --tensorboard is specified. And 
> we need to set quicklinks to let users view tensorboard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-10 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8757:
-
Attachment: YARN-8757.001.patch

> [Submarine] Add Tensorboard component when --tensorboard is specified
> -
>
> Key: YARN-8757
> URL: https://issues.apache.org/jira/browse/YARN-8757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-8757.001.patch
>
>
> We need to have a Tensorboard component when --tensorboard is specified. And 
> we need to set quicklinks to let users view tensorboard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet

2018-09-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609926#comment-16609926
 ] 

Eric Yang commented on YARN-8763:
-

[~Zian Chen] Thank you for the patch.  We usually have dependent project 
version specified in hadoop-project/pom.xml.  For the web socket jar files 
dependencies, please move the logic there.  

WebSocket entry point, needs to accept container id as parameter to guide the 
servlet to interface with the corresponding container.  In HTTP 
session.getUpgradeRequest().getRequestURI() provides the full path. You can 
split it up and get anything that comes after container/... to get the 
container id variable.

We also need a test case to mock the testing of ContainerShellWebSocket is 
tested.

> Add WebSocket logic to the Node Manager web server to establish servlet
> ---
>
> Key: YARN-8763
> URL: https://issues.apache.org/jira/browse/YARN-8763
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8763-001.patch
>
>
> The reason we want to use WebSocket servlet to serve the backend instead of 
> establishing the connection through HTTP is that WebSocket solves a few 
> issues with HTTP which needed for our scenario,
>  # In HTTP, the request is always initiated by the client and the response is 
> processed by the server — making HTTP a unidirectional protocol, while web 
> socket provides the Bi-directional protocol which means either client/server 
> can send a message to the other party.
>  # Full-duplex communication — client and server can talk to each other 
> independently at the same time
>  # Single TCP connection — After upgrading the HTTP connection in the 
> beginning, client and server communicate over that same TCP connection 
> throughout the lifecycle of WebSocket connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet

2018-09-10 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609919#comment-16609919
 ] 

Zian Chen commented on YARN-8763:
-

Hi [~eyang], could you help review the patch? Thanks!

> Add WebSocket logic to the Node Manager web server to establish servlet
> ---
>
> Key: YARN-8763
> URL: https://issues.apache.org/jira/browse/YARN-8763
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8763-001.patch
>
>
> The reason we want to use WebSocket servlet to serve the backend instead of 
> establishing the connection through HTTP is that WebSocket solves a few 
> issues with HTTP which needed for our scenario,
>  # In HTTP, the request is always initiated by the client and the response is 
> processed by the server — making HTTP a unidirectional protocol, while web 
> socket provides the Bi-directional protocol which means either client/server 
> can send a message to the other party.
>  # Full-duplex communication — client and server can talk to each other 
> independently at the same time
>  # Single TCP connection — After upgrading the HTTP connection in the 
> beginning, client and server communicate over that same TCP connection 
> throughout the lifecycle of WebSocket connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet

2018-09-10 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-8763:

Attachment: YARN-8763-001.patch

> Add WebSocket logic to the Node Manager web server to establish servlet
> ---
>
> Key: YARN-8763
> URL: https://issues.apache.org/jira/browse/YARN-8763
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8763-001.patch
>
>
> The reason we want to use WebSocket servlet to serve the backend instead of 
> establishing the connection through HTTP is that WebSocket solves a few 
> issues with HTTP which needed for our scenario,
>  # In HTTP, the request is always initiated by the client and the response is 
> processed by the server — making HTTP a unidirectional protocol, while web 
> socket provides the Bi-directional protocol which means either client/server 
> can send a message to the other party.
>  # Full-duplex communication — client and server can talk to each other 
> independently at the same time
>  # Single TCP connection — After upgrading the HTTP connection in the 
> beginning, client and server communicate over that same TCP connection 
> throughout the lifecycle of WebSocket connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet

2018-09-10 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609917#comment-16609917
 ] 

Zian Chen commented on YARN-8763:
-

Provide initial patch for this.

> Add WebSocket logic to the Node Manager web server to establish servlet
> ---
>
> Key: YARN-8763
> URL: https://issues.apache.org/jira/browse/YARN-8763
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>  Labels: Docker
>
> The reason we want to use WebSocket servlet to serve the backend instead of 
> establishing the connection through HTTP is that WebSocket solves a few 
> issues with HTTP which needed for our scenario,
>  # In HTTP, the request is always initiated by the client and the response is 
> processed by the server — making HTTP a unidirectional protocol, while web 
> socket provides the Bi-directional protocol which means either client/server 
> can send a message to the other party.
>  # Full-duplex communication — client and server can talk to each other 
> independently at the same time
>  # Single TCP connection — After upgrading the HTTP connection in the 
> beginning, client and server communicate over that same TCP connection 
> throughout the lifecycle of WebSocket connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8763) Add WebSocket logic to the Node Manager web server to establish servlet

2018-09-10 Thread Zian Chen (JIRA)
Zian Chen created YARN-8763:
---

 Summary: Add WebSocket logic to the Node Manager web server to 
establish servlet
 Key: YARN-8763
 URL: https://issues.apache.org/jira/browse/YARN-8763
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zian Chen
Assignee: Zian Chen


The reason we want to use WebSocket servlet to serve the backend instead of 
establishing the connection through HTTP is that WebSocket solves a few issues 
with HTTP which needed for our scenario,
 # In HTTP, the request is always initiated by the client and the response is 
processed by the server — making HTTP a unidirectional protocol, while web 
socket provides the Bi-directional protocol which means either client/server 
can send a message to the other party.
 # Full-duplex communication — client and server can talk to each other 
independently at the same time
 # Single TCP connection — After upgrading the HTTP connection in the 
beginning, client and server communicate over that same TCP connection 
throughout the lifecycle of WebSocket connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8762) [Umbrella] Support Interactive Docker Shell to running Containers

2018-09-10 Thread Zian Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-8762:

Attachment: Interactive Docker Shell design doc.pdf

> [Umbrella] Support Interactive Docker Shell to running Containers
> -
>
> Key: YARN-8762
> URL: https://issues.apache.org/jira/browse/YARN-8762
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Zian Chen
>Priority: Major
>  Labels: Docker
> Attachments: Interactive Docker Shell design doc.pdf
>
>
> Debugging distributed application can be challenging on Hadoop. Hadoop 
> provide limited debugging ability through application log files. One of the 
> most frequently requested feature is to provide interactive shell to assist 
> real time debugging. This feature is inspired by docker exec to provide 
> ability to run arbitrary commands in docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8762) [Umbrella] Support Interactive Docker Shell to running Containers

2018-09-10 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609916#comment-16609916
 ] 

Zian Chen commented on YARN-8762:
-

Provide design doc for this. 

> [Umbrella] Support Interactive Docker Shell to running Containers
> -
>
> Key: YARN-8762
> URL: https://issues.apache.org/jira/browse/YARN-8762
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Zian Chen
>Priority: Major
>  Labels: Docker
>
> Debugging distributed application can be challenging on Hadoop. Hadoop 
> provide limited debugging ability through application log files. One of the 
> most frequently requested feature is to provide interactive shell to assist 
> real time debugging. This feature is inspired by docker exec to provide 
> ability to run arbitrary commands in docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8523) Interactive docker shell

2018-09-10 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609914#comment-16609914
 ] 

Zian Chen commented on YARN-8523:
-

Offline discussed with Eric and Wangda, this feature involves creating a 
pipeline among NM, container-exec and docker exec which requires a lot of 
changes to container stack, create Umbrella Jira YARN-8762 to track progress.

> Interactive docker shell
> 
>
> Key: YARN-8523
> URL: https://issues.apache.org/jira/browse/YARN-8523
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Zian Chen
>Priority: Major
>  Labels: Docker
>
> Some application might require interactive unix commands executions to carry 
> out operations.  Container-executor can interface with docker exec to debug 
> or analyze docker containers while the application is running.  It would be 
> nice to support an API to invoke docker exec to perform unix commands and 
> report back the output to application master.  Application master can 
> distribute and aggregate execution of the commands to record in application 
> master log file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8762) [Umbrella] Support Interactive Docker Shell to running Containers

2018-09-10 Thread Zian Chen (JIRA)
Zian Chen created YARN-8762:
---

 Summary: [Umbrella] Support Interactive Docker Shell to running 
Containers
 Key: YARN-8762
 URL: https://issues.apache.org/jira/browse/YARN-8762
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Zian Chen


Debugging distributed application can be challenging on Hadoop. Hadoop provide 
limited debugging ability through application log files. One of the most 
frequently requested feature is to provide interactive shell to assist real 
time debugging. This feature is inspired by docker exec to provide ability to 
run arbitrary commands in docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8754) [UI2] Improve terms on Component Instance page

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609905#comment-16609905
 ] 

Hadoop QA commented on YARN-8754:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-8754 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8754 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21804/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [UI2] Improve terms on Component Instance page 
> ---
>
> Key: YARN-8754
> URL: https://issues.apache.org/jira/browse/YARN-8754
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-07 at 4.12.54 PM.png, Screen Shot 
> 2018-09-07 at 4.30.11 PM.png, YARN-8754.001.patch
>
>
> Component instance page has "node" and "host". These two fields are 
> representing "bare_host" and "hostname" respectively. 
> From UI2 page thats not clear. Thus, table content need to be changed to 
> "bare host" from "node" .
> This page also has "Host URL" which is hard coded to N/A. Thus, removing this 
> field from table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8666) [UI2] Remove application tab from Yarn Queue Page

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609904#comment-16609904
 ] 

Hadoop QA commented on YARN-8666:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-8666 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8666 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21802/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [UI2] Remove application tab from Yarn Queue Page
> -
>
> Key: YARN-8666
> URL: https://issues.apache.org/jira/browse/YARN-8666
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-08-14 at 3.43.18 PM.png, Screen Shot 
> 2018-09-06 at 12.50.14 PM.png, YARN-8666.001.patch
>
>
> Yarn UI2 Queue page puts Application button. This button does not redirect to 
> any other page. In addition to that running application table is also 
> available on same page. 
> Thus, there is no need to have a button for application in Queue page. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609906#comment-16609906
 ] 

Hadoop QA commented on YARN-8753:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-8753 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8753 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21803/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot 
> 2018-09-06 at 6.16.14 PM.png, Screen Shot 2018-09-07 at 11.59.02 AM.png, 
> YARN-8753.001.patch
>
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 
> Due to this issue, Node information page and Node status page shows different 
> node managers count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8658) [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609862#comment-16609862
 ] 

Hadoop QA commented on YARN-8658:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
54s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
52s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 58s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 2 new + 
0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
27s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
53s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 89m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8658 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12939162/YARN-8658.08.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1d006f695516 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 987d819 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-7018) Interface for adding extra behavior to node heartbeats

2018-09-10 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609832#comment-16609832
 ] 

Jason Lowe commented on YARN-7018:
--

Originally I was thinking this could be outside of the scheduler, examining 
scheduler-agnostic settings like SchedulerNode, etc., then it could send 
NODE_RESOURCE_UPDATE to adjust node capabilities which is also 
scheduler-agnostic.  However it would be lower overhead to have the scheduler 
call the plugin directly to avoid the messaging overhead, but it does increase 
coupling between the plugin and the scheduler a little.  I'm fine if we want to 
move the plugin interactions into each of the schedulers.

Back to the prototype patch, I assume NodeHeartBeatPluginImpl is just an 
example and would not be part of the final commit?

There needs to be some lifecycle support around the plugin, i.e.: a way for the 
plugin to know it is being initialized, shutdown, etc.  Having a callback when 
nodes are added and removed would also be helpful for some plugin 
implementations, otherwise the plugin will have to track nodes redundantly to 
know when it sees a new one and some other type of hack like timeouts to know 
when one is no longer being tracked.

Similarly I think it would be nice to have explicit config refresh support in 
the plugin like there is for the schedulers.  One idea: if the plugin class we 
load after refreshing is the same as the old one, do _not_ replace the plugin 
object but rather invoke a refreshConfigs method or something similar that lets 
the existing plugin refresh rather than forcing a load-from-scratch approach on 
each refresh.

> Interface for adding extra behavior to node heartbeats
> --
>
> Key: YARN-7018
> URL: https://issues.apache.org/jira/browse/YARN-7018
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Major
> Attachments: YARN-7018.POC.001.patch, YARN-7018.POC.002.patch
>
>
> This JIRA tracks an interface for plugging in new behavior to node heartbeat 
> processing.  Adding a formal interface for additional node heartbeat 
> processing would allow admins to configure new functionality that is 
> scheduler-independent without needing to replace the entire scheduler.  For 
> example, both YARN-5202 and YARN-5215 had approaches where node heartbeat 
> processing was extended to implement new functionality that was essentially 
> scheduler-independent and could be implemented as a plugin with this 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8680) YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate

2018-09-10 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609821#comment-16609821
 ] 

Jason Lowe commented on YARN-8680:
--

Thanks for updating the patch!

In loadUserLocalizedResources for this patch hunk:
{noformat}
-  if (!key.startsWith(keyPrefix)) {
+
+  if (!key.startsWith(LOCALIZATION_APPCACHE_SUFFIX,
+  keyPrefix.length())) {
 break;
   }
{noformat}
The old code would make sure the key matches the expected prefix, but the new 
code is making the dangerous assumption that the key found has the same base 
prefix that was used in the seek.  That is not necessarily the case.  If there 
are no appcache localization entries in the database then this will seek to the 
first key that occurs lexicographically _after_ the desired key.  That key may 
or may not be long enough to seek keyPrefix.length() characters into it, and if 
it isn't then we explode with an index out of bounds exception.  This code 
needs to walk through a sub-block of keys by checking the full key prefix break 
out of the loop when it doesn't match.  Just above the while loop the code 
computes the desired prefix, so it just needs to cache it in a local variable 
for later comparison in the while loop.

Same comment applies to the handling of the LOCALIZATION_FILECACHE_SUFFIX key 
after the while loop.





> YARN NM: Implement Iterable Abstraction for LocalResourceTrackerstate
> -
>
> Key: YARN-8680
> URL: https://issues.apache.org/jira/browse/YARN-8680
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Pradeep Ambati
>Assignee: Pradeep Ambati
>Priority: Critical
> Attachments: YARN-8680.00.patch, YARN-8680.01.patch, 
> YARN-8680.02.patch, YARN-8680.03.patch
>
>
> Similar to YARN-8242, implement iterable abstraction for 
> LocalResourceTrackerState to load completed and in progress resources when 
> needed rather than loading them all at a time for a respective state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8734) Readiness check for remote service

2018-09-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609797#comment-16609797
 ] 

Eric Yang commented on YARN-8734:
-

Designed document is attached as "Dependency check vs.pdf".

> Readiness check for remote service
> --
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Priority: Major
> Attachments: Dependency check vs.pdf
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8734) Readiness check for remote service

2018-09-10 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-8734:
---

Assignee: Eric Yang

> Readiness check for remote service
> --
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: Dependency check vs.pdf
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8734) Readiness check for remote service

2018-09-10 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8734:

Attachment: Dependency check vs.pdf

> Readiness check for remote service
> --
>
> Key: YARN-8734
> URL: https://issues.apache.org/jira/browse/YARN-8734
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn-native-services
>Reporter: Eric Yang
>Priority: Major
> Attachments: Dependency check vs.pdf
>
>
> When a service is deploying, there can be remote service dependency.  It 
> would be nice to describe ZooKeeper as a dependent service, and the service 
> has reached a stable state, then deploy HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8658) [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-10 Thread Young Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young Chen updated YARN-8658:
-
Attachment: YARN-8658.08.patch

> [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, 
> YARN-8658.06.patch, YARN-8658.07.patch, YARN-8658.08.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8658) [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-10 Thread Young Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609793#comment-16609793
 ] 

Young Chen commented on YARN-8658:
--

Fixed a bug with UAM throwing exceptions on skipping register due to some 
changes I left out while resolving conflicts.

> [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, 
> YARN-8658.06.patch, YARN-8658.07.patch, YARN-8658.08.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8709) CS preemption monitor always fails since one under-served queue was deleted

2018-09-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609766#comment-16609766
 ] 

Hudson commented on YARN-8709:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14916 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14916/])
YARN-8709: CS preemption monitor always fails since one under-served (ericp: 
rev 987d8191ad409298570f7ef981e9bc8fb72ff16c)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicyMockFramework.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicyIntraQueue.java


> CS preemption monitor always fails since one under-served queue was deleted
> ---
>
> Key: YARN-8709
> URL: https://issues.apache.org/jira/browse/YARN-8709
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, scheduler preemption
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8709.001.patch, YARN-8709.002.patch
>
>
> After some queues deleted, the preemption checker in SchedulingMonitor was 
> always skipped  because of YarnRuntimeException for every run.
> Error logs:
> {noformat}
> ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: 
> Exception raised while executing preemption checker, skip this run..., 
> exception=
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't 
> happen, cannot find TempQueuePerPartition for queueName=1535075839208
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> I think there is something wrong with partitionToUnderServedQueues field in 
> ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues 
> can be add but never be removed, except rebuilding this policy. For example, 
> once under-served queue "a" is added into this structure, it will always be 
> there and never be removed, intra-queue preemption checker will try to get 
> all queues info for partitionToUnderServedQueues in 
> IntraQueueCandidatesSelector#selectCandidates and will throw 
> YarnRuntimeException if not found. So that after queue "a" is deleted from 
> queue structure, the preemption checker will always fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (YARN-8761) Service AM support for decommissioning component instances

2018-09-10 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609763#comment-16609763
 ] 

Billie Rinaldi commented on YARN-8761:
--

I think to allow removing specific component instances, we will need to 
maintain a list of decommissioned instances in the Component spec for the 
service. This will prevent future AM attempts from assigning containers to the 
decommissioned instances. We should be able to support decommissioning by 
component instance name or by instance hostname 
(componentInstanceName.serviceName.user.domain).

> Service AM support for decommissioning component instances
> --
>
> Key: YARN-8761
> URL: https://issues.apache.org/jira/browse/YARN-8761
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Major
>
> The idea behind this feature is to have a flex down where specific component 
> instances are removed. Currently on a flex down, the service AM chooses for 
> removal the component instances with the highest IDs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8761) Service AM support for decommissioning component instances

2018-09-10 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created YARN-8761:


 Summary: Service AM support for decommissioning component instances
 Key: YARN-8761
 URL: https://issues.apache.org/jira/browse/YARN-8761
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi


The idea behind this feature is to have a flex down where specific component 
instances are removed. Currently on a flex down, the service AM chooses for 
removal the component instances with the highest IDs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609755#comment-16609755
 ] 

Hadoop QA commented on YARN-8569:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  8m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
52s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 17s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 4 new + 147 unchanged - 1 fixed = 151 total (was 148) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
49s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 
25s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
16s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} 

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-09-10 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609735#comment-16609735
 ] 

Jason Lowe commented on YARN-8648:
--

Thanks for updating the patch!

Should DockerRmCommand take the cgroup hierarchy or null argument in the 
constructor?  It's a bit weird that it requires a container ID in the 
constructor but not the cgroup hierarchy, yet callers need to check if they 
need to pass the hierarchy in order to use it properly.  Typically it's safer 
to put such things in the constructor so callers have to think about it.

Do we really want to ignore EBUSY errors when trying to remove the cgroup 
entries?  I think this is here for the docker-in-docker use-case that share the 
same cgroup parent, but it also suppresses useful error messages when the code 
tries to remove an entry and fails to do so.  As written now, the patch will 
silently fail to remove cgroup entries that are still being used in all cases 
which seems less than ideal.

There is already a {{validate_container_id}} in string-utils.c that the code 
should leverage to check if the argument is a container ID.

The dir_exists check seems extraneous since the code already checks for ENOENT. 
 As you mentioned above, the entry could be deleted after the check anyway.  
The code makes a system call to avoid a system call, so it won't be much of an 
optimization in practice.

Now that optind is not being passed explicitly to exec_docker_command, do we 
really want exec_docker_command to examine/modify the global optind variable?  
What if someone wants to exec multiple docker commands consecutively with very 
different argc/argv values?  Currently the caller would have to be aware of the 
fact that exec_docker_command is using the global optind to calculate argument 
offsets in the loop, but not the global argc/argv values, and manually fixup 
optind between invocations to get it to work properly.

Regarding the "Don't increment optind here" comment, it would be good to 
elaborate a bit more on why an increment would be bad here.  Otherwise the 
comment is simply parroting the code which isn't as helpful.  If we fixup the 
problems with exec_docker_command and optind in the previous issue then I'm 
hoping there wouldn't be a need for careful optind management and comments here.


> Container cgroups are leaked when using docker
> --
>
> Key: YARN-8648
> URL: https://issues.apache.org/jira/browse/YARN-8648
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8648.001.patch, YARN-8648.002.patch, 
> YARN-8648.003.patch, YARN-8648.004.patch
>
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8658) [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609719#comment-16609719
 ] 

Hadoop QA commented on YARN-8658:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
30s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 54s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 2 new + 
1 unchanged - 0 fixed = 3 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  2m 14s{color} 
| {color:red} hadoop-yarn-server-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
23s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.uam.TestUnmanagedApplicationManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8658 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12939134/YARN-8658.07.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f7e41cf7d420 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 
17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8fe4062 |
| maven | version: Apache Maven 

[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609691#comment-16609691
 ] 

Hadoop QA commented on YARN-5464:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
57s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  7m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
36s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 33s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 50 new + 702 unchanged - 6 fixed = 752 total (was 708) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m  
6s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 
36s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
24s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} 

[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-10 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8696:
---
Summary: [AMRMProxy] FederationInterceptor upgrade: home sub-cluster 
heartbeat async  (was: FederationInterceptor upgrade: home sub-cluster 
heartbeat async)

> [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8696.v1.patch, YARN-8696.v2.patch, 
> YARN-8696.v3.patch, YARN-8696.v4.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8658) [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-10 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8658:
---
Summary: [AMRMProxy] Metrics for AMRMClientRelayer inside 
FederationInterceptor  (was: Metrics for AMRMClientRelayer inside 
FederationInterceptor)

> [AMRMProxy] Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, 
> YARN-8658.06.patch, YARN-8658.07.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8760) [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer

2018-09-10 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8760:
---
Issue Type: Sub-task  (was: Task)
Parent: YARN-5597

> [AMRMProxy] Fix concurrent re-register due to YarnRM failover in 
> AMRMClientRelayer
> --
>
> Key: YARN-8760
> URL: https://issues.apache.org/jira/browse/YARN-8760
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
>
> When home YarnRM is failing over, FinishApplicationMaster call from AM can 
> have multiple retry threads outstanding in FederationInterceptor. When new 
> YarnRM come back up, all retry threads will re-register to YarnRM. The first 
> one will succeed but the rest will get "Application Master is already 
> registered" exception. We should catch and swallow this exception and move 
> on. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8760) [AMRMProxy] Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer

2018-09-10 Thread Botong Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8760:
---
Summary: [AMRMProxy] Fix concurrent re-register due to YarnRM failover in 
AMRMClientRelayer  (was: Fix concurrent re-register due to YarnRM failover in 
AMRMClientRelayer)

> [AMRMProxy] Fix concurrent re-register due to YarnRM failover in 
> AMRMClientRelayer
> --
>
> Key: YARN-8760
> URL: https://issues.apache.org/jira/browse/YARN-8760
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
>
> When home YarnRM is failing over, FinishApplicationMaster call from AM can 
> have multiple retry threads outstanding in FederationInterceptor. When new 
> YarnRM come back up, all retry threads will re-register to YarnRM. The first 
> one will succeed but the rest will get "Application Master is already 
> registered" exception. We should catch and swallow this exception and move 
> on. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8760) Fix concurrent re-register due to YarnRM failover in AMRMClientRelayer

2018-09-10 Thread Botong Huang (JIRA)
Botong Huang created YARN-8760:
--

 Summary: Fix concurrent re-register due to YarnRM failover in 
AMRMClientRelayer
 Key: YARN-8760
 URL: https://issues.apache.org/jira/browse/YARN-8760
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Botong Huang
Assignee: Botong Huang


When home YarnRM is failing over, FinishApplicationMaster call from AM can have 
multiple retry threads outstanding in FederationInterceptor. When new YarnRM 
come back up, all retry threads will re-register to YarnRM. The first one will 
succeed but the rest will get "Application Master is already registered" 
exception. We should catch and swallow this exception and move on. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8569) Create an interface to provide cluster information to application

2018-09-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575445#comment-16575445
 ] 

Eric Yang edited comment on YARN-8569 at 9/10/18 6:37 PM:
--

Sysfs is a pseudo file system provided by Linux Kernel to expose system related 
information to user space.  YARN can mimic the same ideology to export cluster 
information to container.  The proposal is to expose cluster information to:

{code}
/hadoop/yarn/sysfs/service.json
{code}

This basically have the runtime information about the deployed application, and 
getting updated when state changes happen.  The file is replicated from YARN 
service AM to host system in appcache for the application.


was (Author: eyang):
Sysfs is a pseudo file system provided by Linux Kernel to expose system related 
information to user space.  YARN can mimic the same ideology to export cluster 
information to container.  The proposal is to expose cluster information to:

{code}
/hadoop/yarn/fs/cluster.json
{code}

This basically have the runtime information about the deployed application, and 
getting updated when state changes happen.  The file is replicated from YARN 
service AM to host system in appcache for the application.

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-10 Thread Young Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Young Chen updated YARN-8658:
-
Attachment: YARN-8658.07.patch

> Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, 
> YARN-8658.06.patch, YARN-8658.07.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-09-10 Thread Young Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609658#comment-16609658
 ] 

Young Chen commented on YARN-8658:
--

Thanks for the feedback [~botong]! Addressed the issues and uploaded a new 
patch.

> Metrics for AMRMClientRelayer inside FederationInterceptor
> --
>
> Key: YARN-8658
> URL: https://issues.apache.org/jira/browse/YARN-8658
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-8658.01.patch, YARN-8658.02.patch, 
> YARN-8658.03.patch, YARN-8658.04.patch, YARN-8658.05.patch, 
> YARN-8658.06.patch, YARN-8658.07.patch
>
>
> AMRMClientRelayer (YARN-7900) is introduced for stateful 
> FederationInterceptor (YARN-7899), to keep track of all pending requests sent 
> to every subcluster YarnRM. We need to add metrics for AMRMClientRelayer to 
> show the state of things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8569) Create an interface to provide cluster information to application

2018-09-10 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609645#comment-16609645
 ] 

Eric Yang commented on YARN-8569:
-

Patch 008 fixed more check style issues, and change sync sysfs api to be based 
on application id instead of combination of application id and container id.  
This reduces the number of network requests and repetitive syncing of same 
cluster spec information.

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8569) Create an interface to provide cluster information to application

2018-09-10 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-8569:

Attachment: YARN-8569.008.patch

> Create an interface to provide cluster information to application
> -
>
> Key: YARN-8569
> URL: https://issues.apache.org/jira/browse/YARN-8569
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8569 YARN sysfs interface to provide cluster 
> information to application.pdf, YARN-8569.001.patch, YARN-8569.002.patch, 
> YARN-8569.003.patch, YARN-8569.004.patch, YARN-8569.005.patch, 
> YARN-8569.006.patch, YARN-8569.007.patch, YARN-8569.008.patch
>
>
> Some program requires container hostnames to be known for application to run. 
>  For example, distributed tensorflow requires launch_command that looks like:
> {code}
> # On ps0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=0
> # On ps1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=ps --task_index=1
> # On worker0.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=0
> # On worker1.example.com:
> $ python trainer.py \
>  --ps_hosts=ps0.example.com:,ps1.example.com: \
>  --worker_hosts=worker0.example.com:,worker1.example.com: \
>  --job_name=worker --task_index=1
> {code}
> This is a bit cumbersome to orchestrate via Distributed Shell, or YARN 
> services launch_command.  In addition, the dynamic parameters do not work 
> with YARN flex command.  This is the classic pain point for application 
> developer attempt to automate system environment settings as parameter to end 
> user application.
> It would be great if YARN Docker integration can provide a simple option to 
> expose hostnames of the yarn service via a mounted file.  The file content 
> gets updated when flex command is performed.  This allows application 
> developer to consume system environment settings via a standard interface.  
> It is like /proc/devices for Linux, but for Hadoop.  This may involve 
> updating a file in distributed cache, and allow mounting of the file via 
> container-executor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8759) Copy of "resource-types.xml" is not deleted if test fails, causes other test failures

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609616#comment-16609616
 ] 

Hadoop QA commented on YARN-8759:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 70m 
51s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}126m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8759 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12939121/YARN-8759.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7f901d92e591 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8fe4062 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21797/testReport/ |
| Max. process+thread count | 855 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21797/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Copy of "resource-types.xml" is not deleted if 

[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609507#comment-16609507
 ] 

Hadoop QA commented on YARN-5464:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
37s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
40s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  3m 
56s{color} | {color:red} hadoop-yarn in trunk failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  4m 
28s{color} | {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  4m 28s{color} | 
{color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  4m 28s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 50 new + 702 unchanged - 6 fixed = 752 total (was 708) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  3s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 70m 23s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 24m 
50s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}173m  0s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-5464 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12939096/YARN-5464.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  

[jira] [Commented] (YARN-8709) CS preemption monitor always fails since one under-served queue was deleted

2018-09-10 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609499#comment-16609499
 ] 

Eric Payne commented on YARN-8709:
--

Thanks [~Tao Yang].
+1. Will commit shortly.

> CS preemption monitor always fails since one under-served queue was deleted
> ---
>
> Key: YARN-8709
> URL: https://issues.apache.org/jira/browse/YARN-8709
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, scheduler preemption
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8709.001.patch, YARN-8709.002.patch
>
>
> After some queues deleted, the preemption checker in SchedulingMonitor was 
> always skipped  because of YarnRuntimeException for every run.
> Error logs:
> {noformat}
> ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: 
> Exception raised while executing preemption checker, skip this run..., 
> exception=
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't 
> happen, cannot find TempQueuePerPartition for queueName=1535075839208
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> I think there is something wrong with partitionToUnderServedQueues field in 
> ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues 
> can be add but never be removed, except rebuilding this policy. For example, 
> once under-served queue "a" is added into this structure, it will always be 
> there and never be removed, intra-queue preemption checker will try to get 
> all queues info for partitionToUnderServedQueues in 
> IntraQueueCandidatesSelector#selectCandidates and will throw 
> YarnRuntimeException if not found. So that after queue "a" is deleted from 
> queue structure, the preemption checker will always fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8709) CS preemption monitor always fails since one under-served queue was deleted

2018-09-10 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-8709:
-
Summary: CS preemption monitor always fails since one under-served queue 
was deleted  (was: intra-queue preemption checker always fail since one 
under-served queue was deleted)

> CS preemption monitor always fails since one under-served queue was deleted
> ---
>
> Key: YARN-8709
> URL: https://issues.apache.org/jira/browse/YARN-8709
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, scheduler preemption
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-8709.001.patch, YARN-8709.002.patch
>
>
> After some queues deleted, the preemption checker in SchedulingMonitor was 
> always skipped  because of YarnRuntimeException for every run.
> Error logs:
> {noformat}
> ERROR [SchedulingMonitor (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor: 
> Exception raised while executing preemption checker, skip this run..., 
> exception=
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: This shouldn't 
> happen, cannot find TempQueuePerPartition for queueName=1535075839208
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.getQueueByPartition(ProportionalCapacityPreemptionPolicy.java:701)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.computeIntraQueuePreemptionDemand(IntraQueueCandidatesSelector.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.IntraQueueCandidatesSelector.selectCandidates(IntraQueueCandidatesSelector.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:514)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:348)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:99)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PolicyInvoker.run(SchedulingMonitor.java:111)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:186)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> I think there is something wrong with partitionToUnderServedQueues field in 
> ProportionalCapacityPreemptionPolicy. Items of partitionToUnderServedQueues 
> can be add but never be removed, except rebuilding this policy. For example, 
> once under-served queue "a" is added into this structure, it will always be 
> there and never be removed, intra-queue preemption checker will try to get 
> all queues info for partitionToUnderServedQueues in 
> IntraQueueCandidatesSelector#selectCandidates and will throw 
> YarnRuntimeException if not found. So that after queue "a" is deleted from 
> queue structure, the preemption checker will always fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8759) Copy of "resource-types.xml" is not deleted if test fails, causes other test failures

2018-09-10 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609463#comment-16609463
 ] 

Manikandan R commented on YARN-8759:


[~bsteinbach] Thanks for raising this.

Please refer 
https://issues.apache.org/jira/browse/YARN-7159?focusedCommentId=16235697=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16235697
 as well. It talks about the problem we faced initially and the reason for 
having the resource types file in each sub component (for ex, yarn nm etc). It 
would be nice if we can find some generic solution. Also, I think there are 
more few more places like you encountered. Please check.

cc [~sunilg]

> Copy of "resource-types.xml" is not deleted if test fails, causes other test 
> failures
> -
>
> Key: YARN-8759
> URL: https://issues.apache.org/jira/browse/YARN-8759
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-8759.001.patch
>
>
> resource-types.xml is copied in several tests to the test machine, but it is 
> deleted only at the end of the test. In case the test fails the file will not 
> be deleted and other tests will fail, because of the wrong configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2018-09-10 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609454#comment-16609454
 ] 

Antal Bálint Steinbach commented on YARN-5464:
--

Hi [~djp],

I uploaded a patch based on the patch of [~rkanter] . If you will have some 
time please review it or if you don't, It would be great if you can suggest 
somebody who is familiar with the issue.

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, yarn
>Reporter: Robert Kanter
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-5464.001.patch, YARN-5464.002.patch, 
> YARN-5464.003.patch, YARN-5464.004.patch, YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2018-09-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-5464:
-
Attachment: YARN-5464.004.patch

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, yarn
>Reporter: Robert Kanter
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-5464.001.patch, YARN-5464.002.patch, 
> YARN-5464.003.patch, YARN-5464.004.patch, YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8759) Copy of "resource-types.xml" is not deleted if test fails, causes other test failures

2018-09-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-8759:
-
Issue Type: Bug  (was: Improvement)

> Copy of "resource-types.xml" is not deleted if test fails, causes other test 
> failures
> -
>
> Key: YARN-8759
> URL: https://issues.apache.org/jira/browse/YARN-8759
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
>
> resource-types.xml is copied in several tests to the test machine, but it is 
> deleted only at the end of the test. In case the test fails the file will not 
> be deleted and other tests will fail, because of the wrong configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8759) Copy of "resource-types.xml" is not deleted if test fails, causes other test failures

2018-09-10 Thread JIRA
Antal Bálint Steinbach created YARN-8759:


 Summary: Copy of "resource-types.xml" is not deleted if test 
fails, causes other test failures
 Key: YARN-8759
 URL: https://issues.apache.org/jira/browse/YARN-8759
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Antal Bálint Steinbach
Assignee: Antal Bálint Steinbach


resource-types.xml is copied in several tests to the test machine, but it is 
deleted only at the end of the test. In case the test fails the file will not 
be deleted and other tests will fail, because of the wrong configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2018-09-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-5464:
-
Attachment: YARN-5464.003.patch

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, yarn
>Reporter: Robert Kanter
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-5464.001.patch, YARN-5464.002.patch, 
> YARN-5464.003.patch, YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-09-10 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609206#comment-16609206
 ] 

Jim Brennan commented on YARN-8648:
---

This is ready for review.

 

> Container cgroups are leaked when using docker
> --
>
> Key: YARN-8648
> URL: https://issues.apache.org/jira/browse/YARN-8648
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8648.001.patch, YARN-8648.002.patch, 
> YARN-8648.003.patch, YARN-8648.004.patch
>
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609152#comment-16609152
 ] 

Hadoop QA commented on YARN-5464:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  7m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 50 new + 702 unchanged - 6 fixed = 752 total (was 708) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
48s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 73m  
8s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 25m 
13s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
30s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}179m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-5464 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12939052/YARN-5464.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux d802166afd3e 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | 

[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2018-09-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-5464:
-
Attachment: YARN-5464.002.patch

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, yarn
>Reporter: Robert Kanter
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-5464.001.patch, YARN-5464.002.patch, 
> YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7505) RM REST endpoints generate malformed JSON

2018-09-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608844#comment-16608844
 ] 

Hadoop QA commented on YARN-7505:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
5s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 12s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 32 unchanged - 2 fixed = 33 total (was 34) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
18s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m  0s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
59s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}166m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMHA |
|   | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-7505 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12897909/YARN-7505.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  

[jira] [Commented] (YARN-8747) [UI2] YARN UI2 page loading failed due to js error under some time zone configuration

2018-09-10 Thread collinma (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608829#comment-16608829
 ] 

collinma commented on YARN-8747:


hi [~sunilg] , could we merge the PR into trunk? Just let me know if you have 
any concern.

> [UI2] YARN UI2 page loading failed due to js error under some time zone 
> configuration
> -
>
> Key: YARN-8747
> URL: https://issues.apache.org/jira/browse/YARN-8747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 3.1.1
>Reporter: collinma
>Assignee: collinma
>Priority: Blocker
> Attachments: YARN-8747.001.patch, image-2018-09-05-18-54-03-991.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> We deployed hadoop 3.1.1 on centos 7.2 servers whose timezone is configured 
> as GMT+8,  the web browser time zone is GMT+8 too. yarn ui page loaded failed 
> due to js error:
>  
> !image-2018-09-05-18-54-03-991.png!
> The moment-timezone js component raised that error. This has been fixed in 
> moment-timezone 
> v0.5.1([see|[https://github.com/moment/moment-timezone/issues/294]).] We need 
> to update moment-timezone version accordingly



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8758) PreemptionMessage when using AMRMClientAsync

2018-09-10 Thread Krishna Kishore (JIRA)
Krishna Kishore created YARN-8758:
-

 Summary: PreemptionMessage when using AMRMClientAsync
 Key: YARN-8758
 URL: https://issues.apache.org/jira/browse/YARN-8758
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 3.1.1
Reporter: Krishna Kishore
 Fix For: 2.7.6


Hi,

   The preemption notification messages sent in the time period defined by the 
following parameter now work only on AMRMClient, but not on AMRMClientAsync.

*yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill*

We want this work on the AMRMClientAsync also because our implementations are 
based on this one. 

 

Thanks,

Kishore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org