[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask

2016-04-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255175#comment-15255175
 ] 

Hudson commented on YARN-4335:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9660 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9660/])
YARN-4335. Allow ResourceRequests to specify ExecutionType of a request (arun 
suresh: rev b2a654c5ee6524f81c971ea0b70e58ea0a455f1d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ResourceRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java


> Allow ResourceRequests to specify ExecutionType of a request ask
> 
>
> Key: YARN-4335
> URL: https://issues.apache.org/jira/browse/YARN-4335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0
>
> Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch, 
> YARN-4335.003.patch
>
>
> YARN-2882 introduced container types that are internal (not user-facing) and 
> are used by the ContainerManager during execution at the NM.
> With this JIRA we are introducing (user-facing) resource request types that 
> are used by the AM to specify the type of the ResourceRequest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask

2016-04-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255173#comment-15255173
 ] 

Arun Suresh commented on YARN-4335:
---

Committed this to trunk..

> Allow ResourceRequests to specify ExecutionType of a request ask
> 
>
> Key: YARN-4335
> URL: https://issues.apache.org/jira/browse/YARN-4335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0
>
> Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch, 
> YARN-4335.003.patch
>
>
> YARN-2882 introduced container types that are internal (not user-facing) and 
> are used by the ContainerManager during execution at the NM.
> With this JIRA we are introducing (user-facing) resource request types that 
> are used by the AM to specify the type of the ResourceRequest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask

2016-04-22 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4335:
--
Fix Version/s: 3.0.0

> Allow ResourceRequests to specify ExecutionType of a request ask
> 
>
> Key: YARN-4335
> URL: https://issues.apache.org/jira/browse/YARN-4335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Fix For: 3.0.0
>
> Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch, 
> YARN-4335.003.patch
>
>
> YARN-2882 introduced container types that are internal (not user-facing) and 
> are used by the ContainerManager during execution at the NM.
> With this JIRA we are introducing (user-facing) resource request types that 
> are used by the AM to specify the type of the ResourceRequest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3150) [Documentation] Documenting the timeline service v2

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255115#comment-15255115
 ] 

Hadoop QA commented on YARN-3150:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 18s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
49s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
45s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
2s {color} | {color:green} YARN-2928 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped branch modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
15s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 7s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patch modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {col

[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255109#comment-15255109
 ] 

Hadoop QA commented on YARN-4390:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 30 new + 505 unchanged - 15 fixed = 535 total (was 520) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 17s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 37s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 12s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 170m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Should 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator$TQComparator
 be a _static_ inner class?  At PreemptableResourceCalculator.java:inner class? 
 At PreemptableResourceCalculator.java:[lines

[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255050#comment-15255050
 ] 

Jian He commented on YARN-4390:
---

some comments after scanning the patch:
- Looks like the approach is to loop over almost all containers several times 
on the cluster  for every preemption cycle (3 secs by default),   to see 
whether some containers can be preempted to make room for the reserved 
container on the same node. Will this cause too much overhead in a large 
cluster where we have a large amount of containers ? A unit test may test the 
time cost for this mega loop.

- unnecessary line breakers are added in multiple places, could you clean those 
up ? especially PreemptableResourceCalculator class. 
- Does this equal to node.getUnallocatedResource? 
{code}
for (RMContainer c : sortedRunningContainers) {
  Resources.subtractFrom(available, c.getAllocatedResource());
}
{code}

- Insn't FifoCandidatesSelector the first selector and the selectedCandidates 
is empty ? 
{code}
// Previous selectors (with higher priority) could have already
// selected containers. We need to deduct preemptable resources
// based on already selected candidates.
CapacitySchedulerPreemptionUtils
.deductPreemptableResourcesBasedSelectedCandidates(preemptionContext,
selectedCandidates);
{code}

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, 
> YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, 
> YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4807) MockAM#waitForState sleep duration is too long

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255000#comment-15255000
 ] 

Hadoop QA commented on YARN-4807:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 22 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 40s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 43s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 21s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 138m 58s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
| JDK v1.8.0_77 Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.serv

[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64

2016-04-22 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254970#comment-15254970
 ] 

Hitesh Shah commented on YARN-4844:
---

Additionally we are not talking about use in production but rather making 
upstream apps change as needed to work with 3.x and over time stabilize 3.x. 
Making an API change earlier rather than later is actually better as the API 
changes  in this case have no relevance to production stability. 

> Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
> --
>
> Key: YARN-4844
> URL: https://issues.apache.org/jira/browse/YARN-4844
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-4844.1.patch, YARN-4844.2.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending 
> resources of running apps to cluster's total pending resources. If a 
> problematic app requires too much resources (let's say 1M+ containers, each 
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that 
> there're many running apps, each of them has capped but still significant 
> numbers of pending resources.
> So we may possibly need to upgrade int32 memory field (could include v-cores 
> as well) to int64 to avoid integer overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64

2016-04-22 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254967#comment-15254967
 ] 

Hitesh Shah commented on YARN-4844:
---

bq.  considering there are hundreds of blockers and criticals of 3.0.0 release, 
nobody will actually use the new release in production even if 3.0-alpha can be 
released. We can mark Resource API of trunk to be unstable and update it in 
future 3.x releases.

So the plan is to force users to change their usage of these APIs in some 
version of 3.x but not in 3.0.0 ? 

> Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
> --
>
> Key: YARN-4844
> URL: https://issues.apache.org/jira/browse/YARN-4844
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-4844.1.patch, YARN-4844.2.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending 
> resources of running apps to cluster's total pending resources. If a 
> problematic app requires too much resources (let's say 1M+ containers, each 
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that 
> there're many running apps, each of them has capped but still significant 
> numbers of pending resources.
> So we may possibly need to upgrade int32 memory field (could include v-cores 
> as well) to int64 to avoid integer overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4920) ATS/NM should support a link to dowload/get the logs in text format

2016-04-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-4920:
---

Assignee: Xuan Gong

> ATS/NM should support a link to dowload/get the logs in text format
> ---
>
> Key: YARN-4920
> URL: https://issues.apache.org/jira/browse/YARN-4920
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode

2016-04-22 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4983:

Attachment: YARN-4983-trunk.002.patch

Added an unit test for the standby metrics. The UT failures in the previous run 
appears to be independent to the changes in this patch. Trying them one more 
time. 

> JVM and UGI metrics disappear after RM is once transitioned to standby mode
> ---
>
> Key: YARN-4983
> URL: https://issues.apache.org/jira/browse/YARN-4983
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch, 
> YARN-4983-trunk.002.patch
>
>
> When get transitioned to standby, the RM will shutdown the existing metric 
> system and relaunch a new one. This will cause the jvm metrics and ugi 
> metrics to miss in the new metric system. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254944#comment-15254944
 ] 

Hadoop QA commented on YARN-4844:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 53 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 3s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 50s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 61 new + 
1404 unchanged - 47 fixed = 1465 total (was 1451) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 47s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 55s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 19s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} |

[jira] [Commented] (YARN-3150) [Documentation] Documenting the timeline service v2

2016-04-22 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254914#comment-15254914
 ] 

Sangjin Lee commented on YARN-3150:
---

[~gtCarrera9]:

bq. For the list of timeline v2 configs, maybe we'd like to distinguish the 
configs that we adopt from existing ATS v1.x configs and the newly introduced 
configs? We may want to stress the overridden configs.
I introduced a column that marks whether the config is new in v.2 as opposed to 
existing. See if that format works.

bq. Maybe we'd like to have a few more sentences about the timeline schema 
creator? There are some "hidden" functions that might be interesting.
I did add a sentence about skipping existing tables. I didn't document the rest 
of the options as I thought those options are mostly geared towards us (TS v.2 
developers) rather than general developers/users. Let me know your thoughts.

bq. We may want to clarify the meaning of "system metrics" and "container 
metrics" in the document. When readers have some v1 background, it may be 
helpful to distinguish a few wordings in the document: "system metrics" vs. 
"application history data" in AHS, "container metrics" vs. the old "public 
container metrics" option in v1.
I tried to clean up the terminology. I am mostly using "system metrics" to 
refer to YARN-generated metrics. "Container metrics" are not entirely accurate 
as we are aggregating them to be at the app level, flow level, etc.

While we're at it, I did notice one of the config properties was not described 
correctly. {{yarn.rm.system-metrics-publisher.emit-container-events}} is about 
RM publisher emitting container *events*, not *metrics*. I corrected the 
description and related variable/method names. cc [~Naganarasimha]

bq. We may want to explicitly mention in the "Publishing application specific 
data" section that this section is mainly for YARN application programmers, but 
not for cluster operators.
Added a sentence.

bq. Note the programmers that the return value of v2 APIs are changed to void?
Good point. Added a couple of sentences.

bq. Maybe we can be more precise about the "reasonable defaults" for flow 
contexts?
Done.

bq. We need separate docs for the REST APIs in the future. Right now the REST 
API doc is just a simple reference.
I changed a word there to say "informal". Yes, this is not a complete REST API 
description. I'm not quite sure if we're at a point where we can generate a 
complete reference for that yet. So that will have to wait a little...

I also added some more about the high level architecture and a diagram.

> [Documentation] Documenting the timeline service v2
> ---
>
> Key: YARN-3150
> URL: https://issues.apache.org/jira/browse/YARN-3150
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: TimelineServiceV2.html, YARN-3150-YARN-2928.01.patch, 
> YARN-3150-YARN-2928.02.patch
>
>
> Let's make sure we will have a document to describe what's new in TS v2, the 
> APIs, the client libs and so on. We should do better around documentation in 
> v2 than v1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3150) [Documentation] Documenting the timeline service v2

2016-04-22 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3150:
--
Attachment: YARN-3150-YARN-2928.02.patch

Posted patch v.2.

Addressed most of Li's comments. We still need to add some more regarding 
setting up HBase. More to come.

To generate the html, go to 
{{hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site}}, and do {{mvn site}}.

> [Documentation] Documenting the timeline service v2
> ---
>
> Key: YARN-3150
> URL: https://issues.apache.org/jira/browse/YARN-3150
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Sangjin Lee
>  Labels: yarn-2928-1st-milestone
> Attachments: TimelineServiceV2.html, YARN-3150-YARN-2928.01.patch, 
> YARN-3150-YARN-2928.02.patch
>
>
> Let's make sure we will have a document to describe what's new in TS v2, the 
> APIs, the client libs and so on. We should do better around documentation in 
> v2 than v1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64

2016-04-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254897#comment-15254897
 ] 

Wangda Tan commented on YARN-4844:
--

[~hitesh], considering there are hundreds of blockers and criticals of 3.0.0 
release, nobody will actually use the new release in production even if 
3.0-alpha can be released. We can mark Resource API of trunk to be unstable and 
update it in future 3.x releases.

> Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
> --
>
> Key: YARN-4844
> URL: https://issues.apache.org/jira/browse/YARN-4844
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-4844.1.patch, YARN-4844.2.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending 
> resources of running apps to cluster's total pending resources. If a 
> problematic app requires too much resources (let's say 1M+ containers, each 
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that 
> there're many running apps, each of them has capped but still significant 
> numbers of pending resources.
> So we may possibly need to upgrade int32 memory field (could include v-cores 
> as well) to int64 to avoid integer overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4335) Allow ResourceRequests to specify ExecutionType of a request ask

2016-04-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254877#comment-15254877
 ] 

Arun Suresh commented on YARN-4335:
---

ping [~leftnoteasy], [~kasha]..
Planning on pushing this to trunk if you guys have no reservations..

> Allow ResourceRequests to specify ExecutionType of a request ask
> 
>
> Key: YARN-4335
> URL: https://issues.apache.org/jira/browse/YARN-4335
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-4335-yarn-2877.001.patch, YARN-4335.002.patch, 
> YARN-4335.003.patch
>
>
> YARN-2882 introduced container types that are internal (not user-facing) and 
> are used by the ContainerManager during execution at the NM.
> With this JIRA we are introducing (user-facing) resource request types that 
> are used by the AM to specify the type of the ResourceRequest.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254872#comment-15254872
 ] 

Hadoop QA commented on YARN-4556:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 8m 59s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
15s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} branch-2.7 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 11s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in branch-2.7 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1890 line(s) that end in whitespace. Use 
git apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 49s 
{color} | {color:red} The patch has 256 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 32s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 18s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 2m 16s 
{color} | {color:red} Patch generated 61 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 129m 23s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resource

[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254865#comment-15254865
 ] 

Hadoop QA commented on YARN-4983:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 8s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 6s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 11s 
{color} | {color:red} root: patch generated 1 new + 173 unchanged - 1 fixed = 
174 total (was 174) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
26s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 50s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 39s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 55s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 46s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
36s {col

[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2016-04-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254861#comment-15254861
 ] 

Wangda Tan commented on YARN-3215:
--

Thanks [~Naganarasimha], committing the patch now.

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3215.branch-2.8.v2.002.patch, 
> YARN-3215.v1.001.patch, YARN-3215.v2.001.patch, YARN-3215.v2.002.patch, 
> YARN-3215.v2.003.patch, YARN-3215.v2.branch-2.8.patch
>
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254859#comment-15254859
 ] 

Wangda Tan commented on YARN-4390:
--

Sure, please [~eepayne]. Would really appreciate if you can get some feedbacks 
in early next week. I hope this can get in soon :).

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, 
> YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, 
> YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4390) Do surgical preemption based on reserved container in CapacityScheduler

2016-04-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4390:
-
Summary: Do surgical preemption based on reserved container in 
CapacityScheduler  (was: Consider container request size during CS preemption)

> Do surgical preemption based on reserved container in CapacityScheduler
> ---
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, 
> YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, 
> YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4390) Consider container request size during CS preemption

2016-04-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4390:
-
Attachment: YARN-4390.5.patch

Rebased to latest trunk, added a couple of tests, and simplified calculator 
code a little as suggested offline by [~jianhe]. (ver.5)

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, 
> YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, 
> YARN-4390.3.patch, YARN-4390.4.patch, YARN-4390.5.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2016-04-22 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-2885:
--
Attachment: YARN-2885.010.patch

This is a combo patch of YARN-2885 and YARN-4335 rebased against trunk to see 
if Jenkins is fine..

> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch, 
> YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, 
> YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, 
> YARN-2885-yarn-2877.v4.patch, YARN-2885-yarn-2877.v5.patch, 
> YARN-2885-yarn-2877.v6.patch, YARN-2885-yarn-2877.v7.patch, 
> YARN-2885-yarn-2877.v8.patch, YARN-2885-yarn-2877.v9.patch, 
> YARN-2885.010.patch, YARN-2885_api_changes.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4851) Metric improvements for ATS v1.5 storage components

2016-04-22 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254750#comment-15254750
 ] 

Hitesh Shah commented on YARN-4851:
---

Some general comments on usability ( have not reviewed the patch in detail)
   - names need a bit of work e.g. SummaryDataReadTimeNumOps and 
SummaryDataReadTimeAvgTime - not sure why NumOps has a relation to ReadTime and 
time in ReadTimeAvgTime seems redundant. 
   - would be good to have the scale in there i.e. is time in millis or 
seconds? 
   - updates to the timeline server docs for these metrics seems missing. 
   - what is the difference bet CacheRefreshTimeNumOps and CacheRefreshOps ? 
   - Likewise for LogCleanTimeNumOps vs LogsDirsCleaned  or PutDomainTimeNumOps 
vs PutDomainOps
   - cache eviction rates needed? 
   - how do we get a count of how many cache refreshes were due to stale data 
vs never cached/evicted earlier? do we need this?
   - should be there 2 levels of metrics - one group enabled by default and a 
second group for more detailed monitoring to reduce load on the metrics system?
   - would be good to understand the request count at the ATSv1.5 level itself 
to understand which calls end up going to summary vs cache vs fs-based lookups 
( i.e. across all gets ).
   - at the overall ATS level, an overall avg latency across all reqs might be 
useful for  a general health check
 



> Metric improvements for ATS v1.5 storage components
> ---
>
> Key: YARN-4851
> URL: https://issues.apache.org/jira/browse/YARN-4851
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4851-trunk.001.patch, YARN-4851-trunk.002.patch
>
>
> We can add more metrics to the ATS v1.5 storage systems, including purging, 
> cache hit/misses, read latency, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4807) MockAM#waitForState sleep duration is too long

2016-04-22 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-4807:
---
Attachment: YARN-4807.015.patch

> MockAM#waitForState sleep duration is too long
> --
>
> Key: YARN-4807
> URL: https://issues.apache.org/jira/browse/YARN-4807
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Yufei Gu
>  Labels: newbie
> Attachments: YARN-4807.001.patch, YARN-4807.002.patch, 
> YARN-4807.003.patch, YARN-4807.004.patch, YARN-4807.005.patch, 
> YARN-4807.006.patch, YARN-4807.007.patch, YARN-4807.008.patch, 
> YARN-4807.009.patch, YARN-4807.010.patch, YARN-4807.011.patch, 
> YARN-4807.012.patch, YARN-4807.013.patch, YARN-4807.014.patch, 
> YARN-4807.015.patch
>
>
> MockAM#waitForState sleep duration (500 ms) is too long. Also, there is 
> significant duplication with MockRM#waitForState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4990) Re-direction of a particular log file within in a container in NM UI does not redirect properly to Log Server ( history ) on container completion

2016-04-22 Thread Hitesh Shah (JIRA)
Hitesh Shah created YARN-4990:
-

 Summary: Re-direction of a particular log file within in a 
container in NM UI does not redirect properly to Log Server ( history ) on 
container completion
 Key: YARN-4990
 URL: https://issues.apache.org/jira/browse/YARN-4990
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah


The NM does the redirection to the history server correctly. However if the 
user is viewing or has a link to a particular specific file, the redirect ends 
up going to the top level page for the container and not redirecting to the 
specific file. Additionally, the start param to show logs from the offset 0 
also goes missing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-04-22 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: yarn4766.004.patch

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch, yarn4766.004.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4390) Consider container request size during CS preemption

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254626#comment-15254626
 ] 

Hadoop QA commented on YARN-4390:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 28 new + 506 unchanged - 15 fixed = 534 total (was 521) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 15s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 10s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 179m 35s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.8.0_77 Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Timed out junit tests | 
org

[jira] [Commented] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254625#comment-15254625
 ] 

Hadoop QA commented on YARN-4984:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 51s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 22s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 37m 29s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12800285/YARN-4984-v2.patch |
| JIRA Issue | YARN-4984 |
| Optional Tests |  asflicense 

[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-04-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254624#comment-15254624
 ] 

Haibo Chen commented on YARN-4766:
--

@Robert Kanter, thanks very much for your review. I have addressed all issues 
in the latest patch. For #6, I didn't follow exactly your comments. Instead, a 
new method that takes configs and expected files. 
testAggregatorWithRetentionPolicyDisabled_shouldUploadAllFiles and 
testAggregatorWhenNoFileOlderThanRetentionPolicy_ShouldUploadAll are still very 
much alike, but most of the code duplication is removed. 

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails

2016-04-22 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254525#comment-15254525
 ] 

Eric Badger commented on YARN-4556:
---

[~eepayne], please review this patch and commit to branch-2.7 if you think it 
looks good. 

>  TestFifoScheduler.testResourceOverCommit fails
> ---
>
> Key: YARN-4556
> URL: https://issues.apache.org/jira/browse/YARN-4556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Akihiro Suda
>Assignee: Akihiro Suda
> Fix For: 2.8.0
>
> Attachments: YARN-4556-1.patch, YARN-4556-branch-2.7.001.patch
>
>
> From YARN-4548 Jenkins log: 
> https://builds.apache.org/job/PreCommit-YARN-Build/10181/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
> {code}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
> Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.004 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
> testResourceOverCommit(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler)
>   Time elapsed: 4.746 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<-2048> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler.testResourceOverCommit(TestFifoScheduler.java:1142)
> {code}
> https://github.com/apache/hadoop/blob/8676a118a12165ae5a8b80a2a4596c133471ebc1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java#L1142
> It seems that Jenkins has been hitting this intermittently since April 2015
> https://www.google.com/search?q=TestFifoScheduler.testResourceOverCommit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails

2016-04-22 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reopened YARN-4556:
---

Adding a 2.7 patch. 

>  TestFifoScheduler.testResourceOverCommit fails
> ---
>
> Key: YARN-4556
> URL: https://issues.apache.org/jira/browse/YARN-4556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Akihiro Suda
>Assignee: Akihiro Suda
> Fix For: 2.8.0
>
> Attachments: YARN-4556-1.patch, YARN-4556-branch-2.7.001.patch
>
>
> From YARN-4548 Jenkins log: 
> https://builds.apache.org/job/PreCommit-YARN-Build/10181/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
> {code}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
> Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.004 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
> testResourceOverCommit(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler)
>   Time elapsed: 4.746 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<-2048> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler.testResourceOverCommit(TestFifoScheduler.java:1142)
> {code}
> https://github.com/apache/hadoop/blob/8676a118a12165ae5a8b80a2a4596c133471ebc1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java#L1142
> It seems that Jenkins has been hitting this intermittently since April 2015
> https://www.google.com/search?q=TestFifoScheduler.testResourceOverCommit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails

2016-04-22 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4556:
--
Attachment: YARN-4556-branch-2.7.001.patch

>  TestFifoScheduler.testResourceOverCommit fails
> ---
>
> Key: YARN-4556
> URL: https://issues.apache.org/jira/browse/YARN-4556
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Akihiro Suda
>Assignee: Akihiro Suda
> Fix For: 2.8.0
>
> Attachments: YARN-4556-1.patch, YARN-4556-branch-2.7.001.patch
>
>
> From YARN-4548 Jenkins log: 
> https://builds.apache.org/job/PreCommit-YARN-Build/10181/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt
> {code}
> Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
> Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.004 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
> testResourceOverCommit(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler)
>   Time elapsed: 4.746 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<-2048> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler.testResourceOverCommit(TestFifoScheduler.java:1142)
> {code}
> https://github.com/apache/hadoop/blob/8676a118a12165ae5a8b80a2a4596c133471ebc1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java#L1142
> It seems that Jenkins has been hitting this intermittently since April 2015
> https://www.google.com/search?q=TestFifoScheduler.testResourceOverCommit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2016-04-22 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254507#comment-15254507
 ] 

Ray Chiang commented on YARN-4971:
--

+1 (nonbinding).  The only new test I can think of would be to verify that the 
member variable address stays at 0.0.0.0 if it's initially 0.0.0.0--mainly 
useful as a "spec" for the class behavior.

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-4971.1.patch
>
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4984:
-
Attachment: YARN-4984-v2.patch

Thanks [~leftnoteasy] for review and comments!
bq.  We may need to remove following statement as well.
Nice catch. Remove this unnecessary code in v2 patch.

> LogAggregationService shouldn't swallow exception in handling createAppDir() 
> which cause thread leak.
> -
>
> Key: YARN-4984
> URL: https://issues.apache.org/jira/browse/YARN-4984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.2
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4984-v2.patch, YARN-4984.patch
>
>
> Due to YARN-4325, many stale applications still exists in NM state store and 
> get recovered after NM restart. The app initiation will get failed due to 
> token invalid, but exception is swallowed and aggregator thread is still 
> created for invalid app.
> Exception is:
> {noformat}
> 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService 
> (LogAggregationService.java:run(300)) - Failed to setup application log 
> directory for application_1448060878692_11842
> 159 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo
> und in cache
> 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 162 at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
> 164 at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown 
> Source)
> 166 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 167 at java.lang.reflect.Method.invoke(Method.java:606)
> 168 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> 169 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
> 171 at 
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 172 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315)
> 173 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311)
> 174 at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 175 at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311)
> 176 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248)
> 177 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67)
> 178 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
> 179 at java.security.AccessController.doPrivileged(Native Method)
> 180 at javax.security.auth.Subject.doAs(Subject.java:415)
> 181 at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 182 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261)
> 183 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367)
> 184 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
> 185 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447)
> 186 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4846) Random failures for TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers

2016-04-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254480#comment-15254480
 ] 

Hudson commented on YARN-4846:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9656 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9656/])
YARN-4846. Fix random failures for (wangda: rev 
7cb3a3da96e59fc9b6528644dae5fb0ac1e44eac)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerPreemption.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java


> Random failures for 
> TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers
> 
>
> Key: YARN-4846
> URL: https://issues.apache.org/jira/browse/YARN-4846
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.9.0
>
> Attachments: 0001-YARN-4846.patch, 0002-YARN-4846.patch, 
> 0003-YARN-4846.patch, 0004-YARN-4846.patch, YARN-4846-update-PCPP.patch
>
>
> {noformat}
> java.lang.AssertionError: expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPreemption.testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers(TestCapacitySchedulerPreemption.java:473)
> {noformat}
> https://builds.apache.org/job/PreCommit-YARN-Build/10826/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacitySchedulerPreemption/testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4851) Metric improvements for ATS v1.5 storage components

2016-04-22 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4851:

Target Version/s: 2.8.0

> Metric improvements for ATS v1.5 storage components
> ---
>
> Key: YARN-4851
> URL: https://issues.apache.org/jira/browse/YARN-4851
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4851-trunk.001.patch, YARN-4851-trunk.002.patch
>
>
> We can add more metrics to the ATS v1.5 storage systems, including purging, 
> cache hit/misses, read latency, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4717) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup

2016-04-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254448#comment-15254448
 ] 

Vinod Kumar Vavilapalli commented on YARN-4717:
---

[~templedf] / [~rkanter], does this exist on previous branches too? If so, can 
this be backported to 2.8.0 / 2.7.x etc?

> TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails 
> Intermittently due to IllegalArgumentException from cleanup
> ---
>
> Key: YARN-4717
> URL: https://issues.apache.org/jira/browse/YARN-4717
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4717.001.patch
>
>
> The same issue that was resolved by [~zxu] in YARN-3602 is back.  Looks like 
> the commons-io package throws an IAE instead of an IOE now if the directory 
> doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4717) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IllegalArgumentException from cleanup

2016-04-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4717:
--
Issue Type: Test  (was: Bug)

> TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails 
> Intermittently due to IllegalArgumentException from cleanup
> ---
>
> Key: YARN-4717
> URL: https://issues.apache.org/jira/browse/YARN-4717
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4717.001.patch
>
>
> The same issue that was resolved by [~zxu] in YARN-3602 is back.  Looks like 
> the commons-io package throws an IAE instead of an IOE now if the directory 
> doesn't exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2016-04-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254441#comment-15254441
 ] 

Vinod Kumar Vavilapalli commented on YARN-4599:
---

bq. We are likely better off setting hard limit for all yarn containers so they 
don't interfere anything else on the machine. We could disable OOM control on 
the cgroup corresponding to all yarn containers (not including NM) and if all 
containers are paused, the NM can decide what tasks to kill. This is 
particularly useful if we are oversubscribing the node. 
This seems like our only choice, given that none of the options to recover 
(when the per-container-limit is hit and when OOM-killer is disabled) are 
usable in practice for YARN containers.

/cc [~sidharta-s], [~shanekumpf]

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3602) TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails Intermittently due to IOException from cleanup

2016-04-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3602:
--
Issue Type: Test  (was: Bug)

> TestResourceLocalizationService.testPublicResourceInitializesLocalDir fails 
> Intermittently due to IOException from cleanup
> --
>
> Key: YARN-3602
> URL: https://issues.apache.org/jira/browse/YARN-3602
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: BB2015-05-RFC
> Fix For: 2.8.0, 2.7.3
>
> Attachments: YARN-3602.000.patch
>
>
> ResourceLocalizationService.testPublicResourceInitializesLocalDir fails 
> Intermittently due to IOException from cleanup. The stack trace is the 
> following from test report at
> https://builds.apache.org/job/PreCommit-YARN-Build/7729/testReport/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer/TestResourceLocalizationService/testPublicResourceInitializesLocalDir/
> {code}
> Error Message
> Unable to delete directory 
> target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/2/filecache.
> Stacktrace
> java.io.IOException: Unable to delete directory 
> target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/2/filecache.
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1541)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>   at org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2270)
>   at org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1653)
>   at org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1535)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.cleanup(TestResourceLocalizationService.java:187)
> {code}
> It looks like we can safely ignore the IOException in cleanup which is called 
> after test.
> The IOException may be due to the test machine environment because 
> TestResourceLocalizationService/2/filecache is created by 
> ResourceLocalizationService#initializeLocalDir.
> testPublicResourceInitializesLocalDir created 0/filecache, 1/filecache, 
> 2/filecache and 3/filecache
> {code}
> for (int i = 0; i < 4; ++i) {
>   localDirs.add(lfs.makeQualified(new Path(basedir, i + "")));
>   sDirs[i] = localDirs.get(i).toString();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode

2016-04-22 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254433#comment-15254433
 ] 

Li Lu commented on YARN-4983:
-

Similar UT failures happened in HADOOP-12563. The patch now get reverted. I'm 
launching another Jenkins run for this JIRA. 

> JVM and UGI metrics disappear after RM is once transitioned to standby mode
> ---
>
> Key: YARN-4983
> URL: https://issues.apache.org/jira/browse/YARN-4983
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch
>
>
> When get transitioned to standby, the RM will shutdown the existing metric 
> system and relaunch a new one. This will cause the jvm metrics and ugi 
> metrics to miss in the new metric system. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2016-04-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254414#comment-15254414
 ] 

Naganarasimha G R commented on YARN-3215:
-

Hi [~wangda], i have corrected the issue in 
{{YARN-3215.branch-2.8.v2.002.patch}} , *TestRMWebServicesNodes* is already 
tracked under YARN-4947. Can you please review...

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-3215.branch-2.8.v2.002.patch, 
> YARN-3215.v1.001.patch, YARN-3215.v2.001.patch, YARN-3215.v2.002.patch, 
> YARN-3215.v2.003.patch, YARN-3215.v2.branch-2.8.patch
>
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4599) Set OOM control for memory cgroups

2016-04-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4599:
---
Attachment: yarn-4599-not-so-useful.patch

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2016-04-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254348#comment-15254348
 ] 

Karthik Kambatla commented on YARN-4599:


FWIW, just posted the not useful version of the patch. 

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2016-04-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254335#comment-15254335
 ] 

Karthik Kambatla commented on YARN-4599:


Looked more into this, specifically on how to resume paused tasks. Per [this 
article|https://lwn.net/Articles/529927]
{noformat}
This operation is only allowed to the top cgroup of a sub-hierarchy.
If OOM-killer is disabled, tasks under cgroup will hang/sleep
in memory cgroup's OOM-waitqueue when they request accountable memory.

For running them, you have to relax the memory cgroup's OOM status by
* enlarge limit or reduce usage.
To reduce usage,
* kill some tasks.
* move some tasks to other group with account migration.
* remove some files (on tmpfs?)

Then, stopped tasks will work again.

At reading, current status of OOM is shown.
oom_kill_disable 0 or 1 (if 1, oom-killer is disabled)
under_oom0 or 1 (if 1, the memory cgroup is under OOM, tasks may
 be stopped.)
{noformat}

Looks like setting OOM control per each task is not particularly useful. We are 
likely better off setting hard limit for all yarn containers so they don't 
interfere anything else on the machine. We could disable OOM control on the 
cgroup corresponding to all yarn containers (not including NM) and if all 
containers are paused, the NM can decide what tasks to kill. This is 
particularly useful if we are oversubscribing the node. 

[~aw], [~vvasudev], [~vinodkv] - what do you think? 

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>
> YARN-1856 adds memory cgroups enforcing support. We should also explicitly 
> set OOM control so that containers are not killed as soon as they go over 
> their usage. Today, one could set the swappiness to control this, but 
> clusters with swap turned off exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2016-04-22 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254330#comment-15254330
 ] 

Allen Wittenauer commented on YARN-4478:


Just an FYI, but there are two key problems that folks should be aware of:

#1:

None of the current hadoop jenkins "build project" jobs actually report all of 
the unit test failures in a consistent manner due to how maven works.  If a 
dependent module fails, then the parent module is never run.  In words:

Given two modules A, B.  Relationship is such that B requires A.  

If A's unit tests succeed, then B's unit tests are executed.

If A's unit tests fail, then B's unit tests are *never run*.

#2

I've already determined that a large chunk of the YARN unit tests CANNOT be run 
simultaneously due to TCP port usage.  (See YARN-4950).  This means that if two 
YARN nightlies are running on the same box at the same time, it's pretty much a 
100% certainty that there will be spurious failures. (Yetus guarantees 
some\-\-but not total\-\-isolation via docker, so precommit should be immune to 
this particular problem.)

That said, I've been working on a Yetus-based replacement for full compiles 
(YETUS-156).  This would at least solve major parts of both these issues. I've 
been running it in test for Hadoop for a while now:  
(https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-test ).  I've 
had unit tests turned off, but I just kicked off another run with the unit 
tests turned on so that you can see what happens.  

> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4478) [Umbrella] : Track all the Test failures in YARN

2016-04-22 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254297#comment-15254297
 ] 

Vinod Kumar Vavilapalli commented on YARN-4478:
---

bq. Point of concern is when a QA report test failures , 
contributors/committers has to search for the test failures JIRA IDs and 
comment on their respective JIRA may be like "test failures are unrelated to 
this patch. test failure is tracked by YARN-" This is very paining task 
when there are multiple module test failures. Instead of remembering all the 
test failures JIRA, Umbrella JIRA would help to find easily.
Actually I've tried adding more and more to this umbrella, but it is going out 
of hand.

I kind of agree with [~kasha], we will always have failing unit tests that need 
fixing. Let's just use the bug-type and component from on - those are easy to 
search for. I'm going to use ticket-type from now on, appreciate others also 
doing the same.

[~rohithsharma], let's use this umbrella for your current initiative and then 
close it down.

> [Umbrella] : Track all the Test failures in YARN
> 
>
> Key: YARN-4478
> URL: https://issues.apache.org/jira/browse/YARN-4478
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254291#comment-15254291
 ] 

Hadoop QA commented on YARN-4984:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
5s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 16s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 50s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 41s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/ha

[jira] [Commented] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

2016-04-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254277#comment-15254277
 ] 

Wangda Tan commented on YARN-4984:
--

Thanks [~djp] for working on this,

We may need to remove following statement as well:
{code}
if (appDirException != null) {
  throw appDirException;
}
{code}

> LogAggregationService shouldn't swallow exception in handling createAppDir() 
> which cause thread leak.
> -
>
> Key: YARN-4984
> URL: https://issues.apache.org/jira/browse/YARN-4984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.2
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4984.patch
>
>
> Due to YARN-4325, many stale applications still exists in NM state store and 
> get recovered after NM restart. The app initiation will get failed due to 
> token invalid, but exception is swallowed and aggregator thread is still 
> created for invalid app.
> Exception is:
> {noformat}
> 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService 
> (LogAggregationService.java:run(300)) - Failed to setup application log 
> directory for application_1448060878692_11842
> 159 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo
> und in cache
> 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 162 at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
> 164 at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown 
> Source)
> 166 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 167 at java.lang.reflect.Method.invoke(Method.java:606)
> 168 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> 169 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
> 171 at 
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 172 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315)
> 173 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311)
> 174 at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 175 at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311)
> 176 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248)
> 177 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67)
> 178 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
> 179 at java.security.AccessController.doPrivileged(Native Method)
> 180 at javax.security.auth.Subject.doAs(Subject.java:415)
> 181 at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 182 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261)
> 183 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367)
> 184 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
> 185 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447)
> 186 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode

2016-04-22 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254238#comment-15254238
 ] 

Daniel Templeton commented on YARN-4983:


I meant there are unit tests that are tripping on the problems you're trying to 
fix. :)

> JVM and UGI metrics disappear after RM is once transitioned to standby mode
> ---
>
> Key: YARN-4983
> URL: https://issues.apache.org/jira/browse/YARN-4983
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch
>
>
> When get transitioned to standby, the RM will shutdown the existing metric 
> system and relaunch a new one. This will cause the jvm metrics and ugi 
> metrics to miss in the new metric system. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode

2016-04-22 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254222#comment-15254222
 ] 

Li Lu commented on YARN-4983:
-

Thanks [~templedf]! 
bq. Some of the unit tests that trip on it are only creating disembodied 
schedulers.
Did you mean the failures for the first time, or from the second patch? I'm 
trying to understand with the second groups of failures, with something 
complaining from the protobuf level...

> JVM and UGI metrics disappear after RM is once transitioned to standby mode
> ---
>
> Key: YARN-4983
> URL: https://issues.apache.org/jira/browse/YARN-4983
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch
>
>
> When get transitioned to standby, the RM will shutdown the existing metric 
> system and relaunch a new one. This will cause the jvm metrics and ugi 
> metrics to miss in the new metric system. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

2016-04-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254209#comment-15254209
 ] 

Junping Du commented on YARN-4984:
--

Attach a patch to fix this issue. The fix is very straightforward, so no UT is 
needed.

> LogAggregationService shouldn't swallow exception in handling createAppDir() 
> which cause thread leak.
> -
>
> Key: YARN-4984
> URL: https://issues.apache.org/jira/browse/YARN-4984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.2
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4984.patch
>
>
> Due to YARN-4325, many stale applications still exists in NM state store and 
> get recovered after NM restart. The app initiation will get failed due to 
> token invalid, but exception is swallowed and aggregator thread is still 
> created for invalid app.
> Exception is:
> {noformat}
> 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService 
> (LogAggregationService.java:run(300)) - Failed to setup application log 
> directory for application_1448060878692_11842
> 159 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo
> und in cache
> 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 162 at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
> 164 at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown 
> Source)
> 166 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 167 at java.lang.reflect.Method.invoke(Method.java:606)
> 168 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> 169 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
> 171 at 
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 172 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315)
> 173 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311)
> 174 at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 175 at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311)
> 176 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248)
> 177 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67)
> 178 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
> 179 at java.security.AccessController.doPrivileged(Native Method)
> 180 at javax.security.auth.Subject.doAs(Subject.java:415)
> 181 at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 182 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261)
> 183 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367)
> 184 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
> 185 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447)
> 186 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4984:
-
Attachment: YARN-4984.patch

> LogAggregationService shouldn't swallow exception in handling createAppDir() 
> which cause thread leak.
> -
>
> Key: YARN-4984
> URL: https://issues.apache.org/jira/browse/YARN-4984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.2
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-4984.patch
>
>
> Due to YARN-4325, many stale applications still exists in NM state store and 
> get recovered after NM restart. The app initiation will get failed due to 
> token invalid, but exception is swallowed and aggregator thread is still 
> created for invalid app.
> Exception is:
> {noformat}
> 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService 
> (LogAggregationService.java:run(300)) - Failed to setup application log 
> directory for application_1448060878692_11842
> 159 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo
> und in cache
> 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 162 at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
> 164 at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown 
> Source)
> 166 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 167 at java.lang.reflect.Method.invoke(Method.java:606)
> 168 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> 169 at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
> 171 at 
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 172 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315)
> 173 at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311)
> 174 at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 175 at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311)
> 176 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248)
> 177 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67)
> 178 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
> 179 at java.security.AccessController.doPrivileged(Native Method)
> 180 at javax.security.auth.Subject.doAs(Subject.java:415)
> 181 at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 182 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261)
> 183 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367)
> 184 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
> 185 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447)
> 186 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode

2016-04-22 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254206#comment-15254206
 ] 

Daniel Templeton commented on YARN-4983:


This also comes up as an issue with some unit tests.  I agree that the root 
issue is the erroneous assumption that the RM and scheduler won't be started a 
second time from within the same VM.  Ideally we'd fix it below the level or 
the RM.  Some of the unit tests that trip on it are only creating disembodied 
schedulers.

> JVM and UGI metrics disappear after RM is once transitioned to standby mode
> ---
>
> Key: YARN-4983
> URL: https://issues.apache.org/jira/browse/YARN-4983
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch
>
>
> When get transitioned to standby, the RM will shutdown the existing metric 
> system and relaunch a new one. This will cause the jvm metrics and ugi 
> metrics to miss in the new metric system. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Attachment: YARN-4325.patch

Put a demo patch first, a completed patch with tests will come later.

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl.PNG, YARN-4325.patch
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. 
> After investigating, there are three issues cause app state leak in NM 
> state-store:
> 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
> NMStateStore.
> 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
> aggregator's doAppLogAggregation() exception case.
> 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has 
> transition to remove app in NM state store. Application in other status - 
> like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to 
> remove this app from NM state store even after app get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-04-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254095#comment-15254095
 ] 

Junping Du edited comment on YARN-4325 at 4/22/16 3:43 PM:
---

We hit the same issue in a cluster recently again. After checking log, related 
code and state machine graph for ApplicationImpl (attached). There are three 
issues cause app state leak in NM state-store
1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
NMStateStore.
2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
aggregator's doAppLogAggregation() exception case.
3. Only Application in *FINISHED*  status receiving APPLICATION_LOG_FINISHED 
has transition to remove app in NM state store. Application in other status - 
like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to 
remove this app from NM state store even after app get finished.
Will put up a patch soon to fix this issue.


was (Author: djp):
We hit the same issue in a cluster recently again. After checking log, related 
code and state machine graph for ApplicationImpl (attached). There are three 
issues cause app state leak in NM state-store
1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
NMStateStore.
2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
aggregator's doAppLogAggregation() exception case.
2. Only Application in *FINISHED*  status receiving APPLICATION_LOG_FINISHED 
has transition to remove app in NM state store. Application in other status - 
like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to 
remove this app from NM state store even after app get finished.
Will put up a patch soon to fix this issue.

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl.PNG
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. 
> After investigating, there are three issues cause app state leak in NM 
> state-store:
> 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
> NMStateStore.
> 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
> aggregator's doAppLogAggregation() exception case.
> 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has 
> transition to remove app in NM state store. Application in other status - 
> like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to 
> remove this app from NM state store even after app get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Description: 
>From a long running cluster, we found tens of thousands of stale apps still be 
>recovered in NM restart recovery. 
After investigating, there are three issues cause app state leak in NM 
state-store:
1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
NMStateStore.
2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
aggregator's doAppLogAggregation() exception case.
2. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has 
transition to remove app in NM state store. Application in other status - like 
APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to remove 
this app from NM state store even after app get finished.

  was:From a long running cluster, we found tens of thousands of stale apps 
still be recovered in NM restart recovery. The reason is some wrong 
configuration setting to log aggregation so the end of log aggregation events 
are not received so stale apps are not purged properly. We should make sure the 
removal of app state to be independent of log aggregation life cycle. 


> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl.PNG
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. 
> After investigating, there are three issues cause app state leak in NM 
> state-store:
> 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
> NMStateStore.
> 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
> aggregator's doAppLogAggregation() exception case.
> 2. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has 
> transition to remove app in NM state store. Application in other status - 
> like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to 
> remove this app from NM state store even after app get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Description: 
>From a long running cluster, we found tens of thousands of stale apps still be 
>recovered in NM restart recovery. 
After investigating, there are three issues cause app state leak in NM 
state-store:
1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
NMStateStore.
2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
aggregator's doAppLogAggregation() exception case.
3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has 
transition to remove app in NM state store. Application in other status - like 
APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to remove 
this app from NM state store even after app get finished.

  was:
>From a long running cluster, we found tens of thousands of stale apps still be 
>recovered in NM restart recovery. 
After investigating, there are three issues cause app state leak in NM 
state-store:
1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
NMStateStore.
2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
aggregator's doAppLogAggregation() exception case.
2. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has 
transition to remove app in NM state store. Application in other status - like 
APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to remove 
this app from NM state store even after app get finished.


> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl.PNG
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. 
> After investigating, there are three issues cause app state leak in NM 
> state-store:
> 1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
> NMStateStore.
> 2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
> aggregator's doAppLogAggregation() exception case.
> 3. Only Application in FINISHED status receiving APPLICATION_LOG_FINISHED has 
> transition to remove app in NM state store. Application in other status - 
> like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to 
> remove this app from NM state store even after app get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Attachment: ApplicationImpl.PNG

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl.PNG
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. The reason is some wrong configuration 
> setting to log aggregation so the end of log aggregation events are not 
> received so stale apps are not purged properly. We should make sure the 
> removal of app state to be independent of log aggregation life cycle. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Attachment: (was: ApplicationImpl)

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl.PNG
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. The reason is some wrong configuration 
> setting to log aggregation so the end of log aggregation events are not 
> received so stale apps are not purged properly. We should make sure the 
> removal of app state to be independent of log aggregation life cycle. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Attachment: ApplicationImpl

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. The reason is some wrong configuration 
> setting to log aggregation so the end of log aggregation events are not 
> received so stale apps are not purged properly. We should make sure the 
> removal of app state to be independent of log aggregation life cycle. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Attachment: (was: ApplicationImpl.gv)

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. The reason is some wrong configuration 
> setting to log aggregation so the end of log aggregation events are not 
> received so stale apps are not purged properly. We should make sure the 
> removal of app state to be independent of log aggregation life cycle. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Attachment: ApplicationImpl.gv

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: ApplicationImpl
>
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. The reason is some wrong configuration 
> setting to log aggregation so the end of log aggregation events are not 
> received so stale apps are not purged properly. We should make sure the 
> removal of app state to be independent of log aggregation life cycle. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4325) Purge app state from NM state-store should cover more LOG_HANDLING cases

2016-04-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4325:
-
Summary: Purge app state from NM state-store should cover more LOG_HANDLING 
cases  (was: purge app state from NM state-store should be independent of log 
aggregation)

> Purge app state from NM state-store should cover more LOG_HANDLING cases
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. The reason is some wrong configuration 
> setting to log aggregation so the end of log aggregation events are not 
> received so stale apps are not purged properly. We should make sure the 
> removal of app state to be independent of log aggregation life cycle. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4325) purge app state from NM state-store should be independent of log aggregation

2016-04-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254095#comment-15254095
 ] 

Junping Du commented on YARN-4325:
--

We hit the same issue in a cluster recently again. After checking log, related 
code and state machine graph for ApplicationImpl (attached). There are three 
issues cause app state leak in NM state-store
1. APPLICATION_LOG_HANDLING_FAILED is not handled with remove App in 
NMStateStore.
2. APPLICATION_LOG_HANDLING_FAILED event is missing in sent when hit 
aggregator's doAppLogAggregation() exception case.
2. Only Application in *FINISHED*  status receiving APPLICATION_LOG_FINISHED 
has transition to remove app in NM state store. Application in other status - 
like APPLICATION_RESOURCES_CLEANUP will ignore the event and later forget to 
remove this app from NM state store even after app get finished.
Will put up a patch soon to fix this issue.

> purge app state from NM state-store should be independent of log aggregation
> 
>
> Key: YARN-4325
> URL: https://issues.apache.org/jira/browse/YARN-4325
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>
> From a long running cluster, we found tens of thousands of stale apps still 
> be recovered in NM restart recovery. The reason is some wrong configuration 
> setting to log aggregation so the end of log aggregation events are not 
> received so stale apps are not purged properly. We should make sure the 
> removal of app state to be independent of log aggregation life cycle. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4962) support filling up containers on node one by one

2016-04-22 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253975#comment-15253975
 ] 

Daniel Templeton commented on YARN-4962:


Got it.  I misunderstood your initial problem statement.  You want jobs to be 
scheduled to fill a node completely before scheduling to the next node to avoid 
having the workload spread like peanut butter all over the cluster, making it 
hard to schedule a job that needs a full node.

In Grid Engine, the scheduling formula is configurable.  The scheduler will 
look for the node for which the scheduling formula has the highest value.  The 
default scheduling formula is essentially "amount of free space."  To get the 
fill-up behavior you want, you'd set the scheduling formula to "number of 
containers."

Which is essentially your original suggestion.

>  support filling up containers on node one by one
> -
>
> Key: YARN-4962
> URL: https://issues.apache.org/jira/browse/YARN-4962
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: sandflee
>
> we had a gpu cluster, jobs with bigger resource request couldn't be satisfied 
> for node is running the jobs with smaller resource request.  we didn't open 
> reserve system because gpu jobs may run days or weeks. we expect scheduler 
> allocate containers to fill the node , then there will be resource to run   
> jobs with big resource request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4963) capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat configurable

2016-04-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253902#comment-15253902
 ] 

Naganarasimha G R commented on YARN-4963:
-

Thanks for the clarification [~wangda] & [~nroberts],  yes point 2 addresses 
the same issue and my mistake i missed to read this. And also agree to the 
focus of this jira to be specific to the system level OFF-SWITCH configuration.

bq. so I think when we do the application-level support the default would need 
to be either unlimited or some high value, otherwise we force all applications 
to set this limit to something other than 1 to get decent OFF_SWITCH scheduling 
behavior.
Once we have system level OFF-SWITCH configuration do we require app level 
default also ? IIUC by default we try to make use of system level OFF-SWITCH 
configuration unless explicitly overridden by the app (implementation can be 
further discussed in that jira)
bq. Sure, my application scheduled very quickly but my locality was terrible so 
I caused a lot of unnecessary cross-switch traffic. So I think we'll need some 
system-minimums that will prevent this type of abuse.
This point is debatable, even though i agree your point for controlling 
cross-switch traffic, but still the app is performing under its capacity limits 
so would it be good to limit it control it.
bq. If application A meets its OFF-SWITCH-per-node limit, do we offer the node 
to other applications in the same queue?
any limitations if we offer the node to other applications in the same queue ? 
it should be fine right ?





> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> 
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 2.7.2
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4390) Consider container request size during CS preemption

2016-04-22 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253884#comment-15253884
 ] 

Eric Payne commented on YARN-4390:
--

{quote}
And since it uses R/W lock, write lock will be acquired only if node add / move 
or node resource update. So in most cases, nobody acquires write lock. I agree 
to cache node list inside PCPP if we do see performance issues.
{quote}
[~leftnoteasy], yes, that is a very good point. I was not thinking about 
{{ClusterNodeTracker#getNodes}} using the read lock, which, of course, can have 
multiple readers at any time. After thinking more about it, I don't think this 
will cause much of a strain on the RM.

I still want to experiment with the patch a little more.

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>Reporter: Eric Payne
>Assignee: Wangda Tan
> Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, 
> YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, 
> YARN-4390.3.patch, YARN-4390.4.patch
>
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4048) Linux kernel panic under strict CPU limits

2016-04-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253789#comment-15253789
 ] 

Naganarasimha G R commented on YARN-4048:
-

Hi [~scootli]

It was a private code modification based on 2.7.0 and is not available outside. 
Hence no documentation of it either.

> Linux kernel panic under strict CPU limits
> --
>
> Key: YARN-4048
> URL: https://issues.apache.org/jira/browse/YARN-4048
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Chengbing Liu
>Priority: Critical
> Attachments: panic.png
>
>
> With YARN-2440 and YARN-2531, we have seen some kernel panics happening under 
> heavy pressure. Even with YARN-2809, it still panics.
> We are using CentOS 6.5, hadoop 2.5.0-cdh5.2.0 with the above patches. I 
> guess the latest version also has the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently

2016-04-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253681#comment-15253681
 ] 

Rohith Sharma K S commented on YARN-4989:
-

Oops , twice comment has come. There was issue in connecting to JIRA which I 
thought earlier comment wont be displayed!!

> TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
> intermittently 
> ---
>
> Key: YARN-4989
> URL: https://issues.apache.org/jira/browse/YARN-4989
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Ajith S
>
> Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
> randomly.
> {noformat}
> java.lang.AssertionError: expected:<> but 
> was:<>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently

2016-04-22 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-4989:
---

Assignee: Ajith S

Assigning to Ajith since he asked me offline.

> TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
> intermittently 
> ---
>
> Key: YARN-4989
> URL: https://issues.apache.org/jira/browse/YARN-4989
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Ajith S
>
> Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
> randomly.
> {noformat}
> java.lang.AssertionError: expected:<> but 
> was:<>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently

2016-04-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253675#comment-15253675
 ] 

Rohith Sharma K S commented on YARN-4989:
-

In test {{estWorkPreservingRMRestart#testCapacitySchedulerRecovery}}, after RM 
recovered and NM's are registered apps are being wait to recover the 
containers. In test code, there are 3 apps runing before RM restart. After RM 
restart, {{waitForNumContainersToRecover}} method is called only for 2 apps. 
{code}
   // Wait for RM to settle down on recovering containers;
waitForNumContainersToRecover(2, rm2, am1_1.getApplicationAttemptId());
waitForNumContainersToRecover(2, rm2, am1_2.getApplicationAttemptId());
waitForNumContainersToRecover(2, rm2, am1_2.getApplicationAttemptId());
{code}

In the above code, third {{waitForNumContainersToRecover}} should be for third 
app instead of 2nd apps which is duplicated.

> TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
> intermittently 
> ---
>
> Key: YARN-4989
> URL: https://issues.apache.org/jira/browse/YARN-4989
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>
> Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
> randomly.
> {noformat}
> java.lang.AssertionError: expected:<> but 
> was:<>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently

2016-04-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253667#comment-15253667
 ] 

Rohith Sharma K S commented on YARN-4989:
-

In the test {{TestWorkPreservingRMRestart#testCapacitySchedulerRecovery}}, 
after RM is restarted, method {{waitForNumContainersToRecover}} has been called 
for submitted apps. There are 2 apps submitted, but waiting is only for 2 apps 
i.e am1_1 and am1_2. There is another AM *am2* which need to wait for container 
recovery. Code is there to wait but it is waiting for am1_2 only. 
{code}
// Wait for RM to settle down on recovering containers;
waitForNumContainersToRecover(2, rm2, am1_1.getApplicationAttemptId());
waitForNumContainersToRecover(2, rm2, am1_2.getApplicationAttemptId());
waitForNumContainersToRecover(2, rm2, am1_2.getApplicationAttemptId());
{code}

In third waitForNumContainersToRecover, instead of am1_2, variable am2 should 
solve this randomness

> TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
> intermittently 
> ---
>
> Key: YARN-4989
> URL: https://issues.apache.org/jira/browse/YARN-4989
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>
> Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
> randomly.
> {noformat}
> java.lang.AssertionError: expected:<> but 
> was:<>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4989) TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails intermittently

2016-04-22 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4989:
---

 Summary: TestWorkPreservingRMRestart#testCapacitySchedulerRecovery 
fails intermittently 
 Key: YARN-4989
 URL: https://issues.apache.org/jira/browse/YARN-4989
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Rohith Sharma K S


Sometimes TestWorkPreservingRMRestart#testCapacitySchedulerRecovery fails 
randomly.
{noformat}
java.lang.AssertionError: expected:<> but 
was:<>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.checkCSLeafQueue(TestWorkPreservingRMRestart.java:289)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart.testCapacitySchedulerRecovery(TestWorkPreservingRMRestart.java:501)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4048) Linux kernel panic under strict CPU limits

2016-04-22 Thread lihuaqing (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253619#comment-15253619
 ] 

lihuaqing commented on YARN-4048:
-

Hi Naganarasimha G R:
   I want to know how config the cgroup cpuset with hadoop 2.7.1. I don't find  
in the document of hadoop. Please show me?

> Linux kernel panic under strict CPU limits
> --
>
> Key: YARN-4048
> URL: https://issues.apache.org/jira/browse/YARN-4048
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Chengbing Liu
>Priority: Critical
> Attachments: panic.png
>
>
> With YARN-2440 and YARN-2531, we have seen some kernel panics happening under 
> heavy pressure. Even with YARN-2809, it still panics.
> We are using CentOS 6.5, hadoop 2.5.0-cdh5.2.0 with the above patches. I 
> guess the latest version also has the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4957) Add getNewReservation in ApplicationClientProtocol

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253576#comment-15253576
 ] 

Hadoop QA commented on YARN-4957:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 57s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
33s {color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped branch modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 2s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 57s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
17s {color} | {color:green} root: patch generated 0 new + 271 unchanged - 3 
fixed = 271 total (was 274) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s 
{color} | {color:blue} Skipped patch modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 
51s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 38s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 42s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 44s 
{color} | {color:gr

[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253535#comment-15253535
 ] 

Hadoop QA commented on YARN-4983:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 38s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
6s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 37s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 7s 
{color} | {color:red} root: patch generated 1 new + 173 unchanged - 1 fixed = 
174 total (was 174) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 8s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_77. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 45s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 3s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 13s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 37s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
33s {color} | {colo

[jira] [Commented] (YARN-4390) Consider container request size during CS preemption

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253522#comment-15253522
 ] 

Hadoop QA commented on YARN-4390:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 12 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 29 new + 506 unchanged - 16 fixed = 535 total (was 522) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 14s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_77. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 50s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
15s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 165m 26s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_77 Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
|   | hadoop.yarn.server.resourcemanager.TestAppManager |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
 |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication |
|   | hadoop.yarn.ser

[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64

2016-04-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253472#comment-15253472
 ] 

Hadoop QA commented on YARN-4844:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 53 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 25s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 54s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 1s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 4m 25s {color} | 
{color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_77 with JDK v1.8.0_77 
generated 1 new + 2 unchanged - 1 fixed = 3 total (was 3) {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 1s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 48s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 61 new + 
1408 unchanged - 47 fixed = 1469 total (was 1455) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 15s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_77 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 54s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK 

[jira] [Created] (YARN-4988) Limit filter in ApplicationBaseProtocol#getApplications should return latest applications

2016-04-22 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4988:
---

 Summary: Limit filter in ApplicationBaseProtocol#getApplications 
should return latest applications
 Key: YARN-4988
 URL: https://issues.apache.org/jira/browse/YARN-4988
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


When ever limit filter is used to get application report using 
ApplicationBaseProtocol#getApplications, the applications retrieved are not the 
latest. The retrieved applications are random based on the hashcode. 

The reason for above problem is RM maintains the apps in MAP where in insertion 
of application id is based on the hashcode. So if there are 10 applications 
from app-1 to app-10 and then limit is 5, then supposed to expect that 
applications from app-6 to app-10 should be retrieved. But now some first 5 
apps in the MAP are retrieved. So applications retrieved are random 5!!

I think limit should retrieve latest applications only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)