[jira] [Commented] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg

2016-02-24 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166867#comment-15166867
 ] 

Tsuyoshi Ozawa commented on YARN-4673:
--

Hi [~sandflee] thank you for the contribution. Could you explain the cause of 
the deadlock? It helps us to review your patch more fast and more correctly.

> race condition in ResourceTrackerService#nodeHeartBeat while processing 
> deduplicated msg
> 
>
> Key: YARN-4673
> URL: https://issues.apache.org/jira/browse/YARN-4673
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4673.01.patch
>
>
> we could add a lock like ApplicationMasterService#allocate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4731) Linux container executor fails to delete nmlocal folders

2016-02-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4731:
---
Description: 
Enable LCE and CGroups
Submit a mapreduce job

{noformat}
2016-02-24 18:56:46,889 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
2016-02-24 18:56:46,894 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 3
main : run as user is dsperf
main : requested yarn user is dsperf
failed to rmdir job.jar: Not a directory
Error while deleting 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
 20 (Not a directory)
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
dsperf, dsperf, 3, 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
2016-02-24 18:56:46,894 ERROR 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: DeleteAsUser 
for 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
 returned with exit code: 255
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=255:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
at 
org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 10 more

{noformat}

As a result nodemanager-local directory are not getting deleted for each 
application

{noformat}
total 36
drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./
drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../
-rw--- 1 hdfs hadoop  340 Feb 25 08:25 container_tokens
lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.jar -> 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/11/job.jar/
lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.xml -> 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/13/job.xml*
drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 jobSubmitDir/
-rwx-- 1 hdfs hadoop 5348 Feb 25 08:25 launch_container.sh*
drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 tmp/
{noformat}

  was:
Enable LCE and CGroups
Submit a mapreduce job

{noformat}
2016-02-24 18:56:46,889 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
2016-02-24 18:56:46,894 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 3
main : run as user is dsperf
main : requested yarn user is dsperf
failed to rmdir job.jar: Not a directory
Error while deleting 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf

[jira] [Commented] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166842#comment-15166842
 ] 

Hadoop QA commented on YARN-4673:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
10s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 16 unchanged - 3 fixed = 17 total (was 19) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 25s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 29s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 34s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 155m 36s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Unread field:ResourceTrackerService.java:[line 623] |
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourc

[jira] [Resolved] (YARN-4730) YARN preemption based on instantaneous fair share

2016-02-24 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph resolved YARN-4730.
-
Resolution: Duplicate

YARN-2026

> YARN preemption based on instantaneous fair share
> -
>
> Key: YARN-4730
> URL: https://issues.apache.org/jira/browse/YARN-4730
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Prabhu Joseph
>
> On a big cluster with Total Cluster Resource of 10TB, 3000 cores and Fair 
> Sheduler having 230 queues and total 6 jobs run a day. [ all 230 queues 
> are very critical and hence the minResource is same for all]. On this case, 
> when a Spark Job is run on queue A and which occupies the entire cluster 
> resource and does not release any resource, another job submitted into queue 
> B and preemption is getting only the Fair Share which is <10TB , 3000> / 230 
> = <45 GB , 13 cores> which is very less fair share for a queue.shared by many 
> applications. 
> The Preemption should get the instantaneous fair Share, that is <10TB, 3000> 
> / 2 (active queues) = 5TB and 1500 cores, so that the first job won't hog the 
> entire cluster resource and also the subsequent jobs run fine.
> This issue is only when the number of queues are very high. In case of less 
> number of queues, Preemption getting Fair Share would be suffice as the fair 
> share will be high. But in case of too many number of queues, Preemption 
> should try to get the instantaneous Fair Share.
> Note: Configuring optimal maxResources to 230 queues is difficult and also 
> putting constraint for the queues using maxResource will leave  cluster 
> resource idle most of the time.
> There are 1000s of Spark Jobs, so asking each user to restrict the 
> number of executors is also difficult.
> Preempting Instantaneous Fair Share will help to overcome the above issues.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166819#comment-15166819
 ] 

Hadoop QA commented on YARN-4720:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 patch generated 1 new + 18 unchanged - 1 fixed = 19 total (was 19) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
58s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 47s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 19s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 32m 46s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12789865/YARN-4720.04.patch |
| JIRA Issue | YARN-4720 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux c3b3cba6bf60 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patch

[jira] [Commented] (YARN-4735) Remove stale LogAggregationReport from NM's context

2016-02-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166778#comment-15166778
 ] 

Karthik Kambatla commented on YARN-4735:


We have seen this issue as well. 

> Remove stale LogAggregationReport from NM's context
> ---
>
> Key: YARN-4735
> URL: https://issues.apache.org/jira/browse/YARN-4735
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> {quote}
> All LogAggregationReport(current and previous) are only added to 
> *context.getLogAggregationStatusForApps*, and never removed.
> So for long running service, the LogAggregationReport list NM sends to RM 
> will grow over time.
> {quote}
> Per discussion in YARN-4720, we need remove stale LogAggregationReport from 
> NM's context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4731) Linux container executor fails on DeleteAsUser

2016-02-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4731:
---
Priority: Critical  (was: Major)

> Linux container executor fails on DeleteAsUser
> --
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 10 more
> {noformat}
> As a result nodemanager local directory are not getting deleted for each 
> application



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166741#comment-15166741
 ] 

Jun Gong commented on YARN-4720:


Thanks for the suggestion. Attach a new patch to address it.

{quote}
ah, that is a good point. So for long running service, the LogAggregationReport 
list NM sends to RM will grow over time. Sounds like a bug; but not something 
related to this jira. Jun Gong, you want to open a separate jira for that?
{quote}
Thanks for the confirmation. Just created for YARN-4735 to address it.

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch, 
> YARN-4720.03.patch, YARN-4720.04.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4735) Remove stale LogAggregationReport from NM's context

2016-02-24 Thread Jun Gong (JIRA)
Jun Gong created YARN-4735:
--

 Summary: Remove stale LogAggregationReport from NM's context
 Key: YARN-4735
 URL: https://issues.apache.org/jira/browse/YARN-4735
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jun Gong
Assignee: Jun Gong


{quote}
All LogAggregationReport(current and previous) are only added to 
*context.getLogAggregationStatusForApps*, and never removed.

So for long running service, the LogAggregationReport list NM sends to RM will 
grow over time.
{quote}
Per discussion in YARN-4720, we need remove stale LogAggregationReport from 
NM's context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-02-24 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166740#comment-15166740
 ] 

Naganarasimha G R commented on YARN-4712:
-

One way to avoid ( in particular to CPU usage) is multiply with 100 and floor 
it and then type cast it to int. 
But we need to further think YARN-4053 's aproach is not a limitation for 
others to load the metrics as it doesnt support decimals !
cc /[~sjlee0] & [~varun_saxena]

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4720:
---
Attachment: YARN-4720.04.patch

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch, 
> YARN-4720.03.patch, YARN-4720.04.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166710#comment-15166710
 ] 

Hadoop QA commented on YARN-4720:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 patch generated 1 new + 17 unchanged - 1 fixed = 18 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 8s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 31s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 51s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12789856/YARN-4720.03.patch |
| JIRA Issue | YARN-4720 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 1e6841db56b4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchpr

[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166691#comment-15166691
 ] 

Ming Ma commented on YARN-4720:
---

ah, that is a good point. So for long running service, the 
{{LogAggregationReport}} list NM sends to RM will grow over time. Sounds like a 
bug; but not something related to this jira. [~hex108], you want to open a 
separate jira for that?

To have it send RUNNING report for all scenarios, how about moving the 
following block to finally?

{noformat}
  LogAggregationStatus logAggregationStatus =
  logAggregationSucceedInThisCycle
  ? LogAggregationStatus.RUNNING
  : LogAggregationStatus.RUNNING_WITH_FAILURE;
  sendLogAggregationReport(logAggregationStatus, diagnosticMessage);
{noformat}

Instead of creating a new {{operateWriterFailed}}, maybe it can reuse 
{{logAggregationSucceedInThisCycle}} instead.

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch, 
> YARN-4720.03.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-02-24 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166682#comment-15166682
 ] 

Naganarasimha G R commented on YARN-4712:
-

Thanks [~sunilg],
Yes the first scenario is same as that jira, we should not proceed ahead with 
the calculations (divide by processors) if its -1 as usage, hope to see that 
jira to be committed.
2nd we need to discuss whether long is sufficient or we need to support double

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg

2016-02-24 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4673:
---
Attachment: YARN-4673.01.patch

> race condition in ResourceTrackerService#nodeHeartBeat while processing 
> deduplicated msg
> 
>
> Key: YARN-4673
> URL: https://issues.apache.org/jira/browse/YARN-4673
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4673.01.patch
>
>
> we could add a lock like ApplicationMasterService#allocate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4731) Linux container executor fails on DeleteAsUser

2016-02-24 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1516#comment-1516
 ] 

Bibin A Chundatt commented on YARN-4731:


*Command array logs*
/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor 
dsperf dsperf 3 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0002/container_e02_1456319010019_0002_01_01
main : command provided 3
main : run as user is dsperf
main : requested yarn user is dsperf
failed to rmdir job.jar: Not a directory
Error while deleting 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0002/container_e02_1456319010019_0002_01_01:
 20 (Not a directory)


> Linux container executor fails on DeleteAsUser
> --
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 10 more
> {noformat}
> As a result nodemanager local directory are not getting deleted for each 
> application



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4729) SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE

2016-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166648#comment-15166648
 ] 

Hudson commented on YARN-4729:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9366 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9366/])
YARN-4729. SchedulerApplicationAttempt#getTotalRequiredResources can (kasha: 
rev c684f2b007a4808dafbe1c1d3ce01758e281d329)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* hadoop-yarn-project/CHANGES.txt


> SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE
> --
>
> Key: YARN-4729
> URL: https://issues.apache.org/jira/browse/YARN-4729
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.2
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Fix For: 2.9.0
>
> Attachments: yarn-4729.patch
>
>
> SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE. We 
> saw this in a unit test failure. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166644#comment-15166644
 ] 

Jun Gong commented on YARN-4720:


Thanks for explaining. Attach a new patch to fix it.

{quote}
Yes, NM can send several {{LogAggregationReport}}s in the list which is 
ordered; that is the API between NM and RM. Then on RM side, it will retrieve 
all elements from the list.
{quote} 
IIUC all LogAggregationReport(current and previous) are only added to 
'context.getLogAggregationStatusForApps', and never removed.

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch, 
> YARN-4720.03.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4720:
---
Attachment: YARN-4720.03.patch

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch, 
> YARN-4720.03.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166633#comment-15166633
 ] 

Bikas Saha commented on YARN-1040:
--

I am sorry if I caused a digression by mentioning Slider etc.

I am not sure the upgrade scenario is the only one for this jira since this 
jira covers a broader set. Even without upgrades apps can change the processes 
they are running in a container without having to lose the container 
allocation. Identical calls of primitives could be used without the notion of 
upgrade. E.g. start a Java process first for a Java task, then launch a python 
process for a Python task. To the NM this is identical to starting v1 and then 
starting v2. So while it makes sense for the second one to use an API called 
upgrade, it may not for the first one. 

(Unrelated to this jira, IMO, YARN should allow upgrade of app code without 
losing containers but not necessarily understand it deeply. E.g. YARN need not 
assume that upgrade will need additional resource or try to acquire them 
transparently for the application.)

For the purpose of this jira here is what my thoughts are when I had opened 
YARN-1292 to delink process lifecycle from container.
1) new API - acquireContainer - means ask for the allocated resource. The API 
has a flag to specify whether process exit implies releaseContainer. This is 
for backwards compatibility with a default of true. Apps that want to continue 
to use that behavior can explicitly pass true when using the new API and is 
mainly for reducing number of RPCs for apps like MR/Tez etc.
2) new API - startProcess - means start the remote process
3) new API - stopProcess - means stop the remote process
4) new API - releaseContainer - means release the allocated resource
5) Potentially a new API for localization, though in theory, this could be 
separate.

Since this fine grained control makes the protocol chatty, we can reduce the 
RPC traffic by having a new NM RPC, say NMCommand, that takes a sequence of API 
primitives that can be sent in 1 RPC.
So the current API of startContainer effectively becomes NMCommand(1, 2) and 
stopContainer becomes NMCommand(3,4). This can be leveraged for backwards 
compatibility and rolling upgrades.

The above items would effectively delink process and container lifecyle and 
close out this jira.

This provides the fine grained control in core YARN that can be used for 
various scenarios e.g. upgrades without YARN understanding the scenarios. If we 
need to add higher level notions for upgrades etc. then those could be done as 
separate items.

I hope that helps make my thoughts concrete within the scope of this jira.


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-556) [Umbrella] RM Restart phase 2 - Work preserving restart

2016-02-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166612#comment-15166612
 ] 

Rohith Sharma K S commented on YARN-556:


You probably hitting any one of the following issues YARN-2340 YARN-2308 
YARN-4000. Any queue configuration got changed after restart?


> [Umbrella] RM Restart phase 2 - Work preserving restart
> ---
>
> Key: YARN-556
> URL: https://issues.apache.org/jira/browse/YARN-556
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: graceful, resourcemanager, rolling upgrade
>Reporter: Bikas Saha
> Attachments: Work Preserving RM Restart.pdf, 
> WorkPreservingRestartPrototype.001.patch, YARN-1372.prelim.patch
>
>
> YARN-128 covered storing the state needed for the RM to recover critical 
> information. This umbrella jira will track changes needed to recover the 
> running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-02-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166593#comment-15166593
 ] 

Rohith Sharma K S commented on YARN-4624:
-

thanks [~brahmareddy] for providing patch.
one nit: why are we using wrapper Float instead of primitive float?

> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, 
> YARN-4624-003.patch, YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166575#comment-15166575
 ] 

Ming Ma commented on YARN-4720:
---

It seems that {{LogAggregationStatus.RUNNING}} implies the log aggregation 
service is running, it doesn't necessarily mean NM actually aggregate any logs. 
So if the long running service is running and hasn't generate any logs since it 
starts, it is better to return {{LogAggregationStatus.RUNNING}}.

Yes, NM can send several {{LogAggregationReport}}s in the list which is 
ordered; that is the API between NM and RM. Then on RM side, it will retrieve 
all elements from the list.

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

2016-02-24 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166567#comment-15166567
 ] 

Jun Gong commented on YARN-3998:


Thanks [~vinodkv] for explaining it.

{quote}
My point was mainly about creating and reusing a common policy-framework even 
if the actual policies may not be entirely reused. We should seriously consider 
this instead of creating adhoc APIs for custom hard-coded policies.
{quote}
Yes, it will be better if we could reuse a common policy-framework, we might 
need discuss it more.

{quote}
I'm okay creating separate JIRAs under YARN-3998 if you both think of doing so, 
but treat (some of the above) as blockers for releasing this feature. Given 
that, does it make sense to work on this in a branch?
{quote}
I could address these block problems in this issue if needed.  [~vvasudev] 
Could you share your thought please? Thanks.

> Add retry-times to let NM re-launch container when it fails to run
> --
>
> Key: YARN-3998
> URL: https://issues.apache.org/jira/browse/YARN-3998
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3998.01.patch, YARN-3998.02.patch, 
> YARN-3998.03.patch, YARN-3998.04.patch, YARN-3998.05.patch, YARN-3998.06.patch
>
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM 
> launches containers, it could specify the value. Then NM will re-launch the 
> container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not 
> need to re-schedule the container. And local files in container's working 
> directory will be left for re-use.(If container have downloaded some big 
> files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4731) Linux container executor fails on DeleteAsUser

2016-02-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4731:
---
Description: 
Enable LCE and CGroups
Submit a mapreduce job

{noformat}
2016-02-24 18:56:46,889 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
2016-02-24 18:56:46,894 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 3
main : run as user is dsperf
main : requested yarn user is dsperf
failed to rmdir job.jar: Not a directory
Error while deleting 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
 20 (Not a directory)
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
dsperf, dsperf, 3, 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
2016-02-24 18:56:46,894 ERROR 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: DeleteAsUser 
for 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
 returned with exit code: 255
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=255:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
at 
org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 10 more

{noformat}

As a result nodemanager local directory are not getting deleted for each 
applicaton

  was:
Enable LCE and CGroups
Submit a mapreduce job

{noformat}
2016-02-24 18:56:46,889 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
2016-02-24 18:56:46,894 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 3
main : run as user is dsperf
main : requested yarn user is dsperf
failed to rmdir job.jar: Not a directory
Error while deleting 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
 20 (Not a directory)
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
dsperf, dsperf, 3, 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
2016-02-24 18:56:46,894 ERROR 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: DeleteAsUser 
for 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
 returned with exit code: 255
org.apache.hadoop.yarn.

[jira] [Updated] (YARN-4731) Linux container executor fails on DeleteAsUser

2016-02-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4731:
---
Description: 
Enable LCE and CGroups
Submit a mapreduce job

{noformat}
2016-02-24 18:56:46,889 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
2016-02-24 18:56:46,894 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 3
main : run as user is dsperf
main : requested yarn user is dsperf
failed to rmdir job.jar: Not a directory
Error while deleting 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
 20 (Not a directory)
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
dsperf, dsperf, 3, 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
2016-02-24 18:56:46,894 ERROR 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: DeleteAsUser 
for 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
 returned with exit code: 255
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=255:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
at 
org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 10 more

{noformat}

As a result nodemanager local directory are not getting deleted for each 
application

  was:
Enable LCE and CGroups
Submit a mapreduce job

{noformat}
2016-02-24 18:56:46,889 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
2016-02-24 18:56:46,894 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 3
main : run as user is dsperf
main : requested yarn user is dsperf
failed to rmdir job.jar: Not a directory
Error while deleting 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
 20 (Not a directory)
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
dsperf, dsperf, 3, 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
2016-02-24 18:56:46,894 ERROR 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: DeleteAsUser 
for 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
 returned with exit code: 255
org.apache.hadoop.yarn

[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166538#comment-15166538
 ] 

Jun Gong commented on YARN-4720:


Thanks [~mingma] for review and comments.

{quote}
When pendingContainerInThisCycle is empty, NM will skip sending the 
LogAggregationReport with LogAggregationStatus.RUNNING. It means for a long 
running service, it is possible for a yarn client to get 
LogAggregationStatus.NOT_START when it calls 
ApplicationClientProtocol#getApplicationReport if the long running service 
doesn't generate any log. Without the patch, NM will send 
LogAggregationStatus.RUNNING regardless. So it might be better to still send 
LogAggregationStatus.RUNNING regardless.
{quote}
Yes, it is a different behavior actually. LogAggregationReport is a report for 
current status, is it necessary to send a report if NM has not done log 
aggregation actually?

BTW: I noticed that there is no cleanup for previous LogAggregationReport, 
there is only 'this.context.getLogAggregationStatusForApps().add()' and no 
'remove', is it a deliberate action?

{quote}
When LogWriter creation throws exception and appFinished is true, NM will send 
a LogAggregationReport with LogAggregationStatus.SUCCEEDED. Without the patch, 
NM won't send any final LogAggregationReport. Maybe it is better to update the 
patch to send LogAggregationStatus.FAILED for such scenario.
{quote}
I will update the patch to address it.

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4731) Linux container executor fails on DeleteAsUser

2016-02-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4731:
---
Summary: Linux container executor fails on DeleteAsUser  (was: Linux 
container executor exception on DeleteAsUser)

> Linux container executor fails on DeleteAsUser
> --
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4729) SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE

2016-02-24 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166526#comment-15166526
 ] 

Brahma Reddy Battula commented on YARN-4729:


[~kasha] thanks for reporting and working on this..+1 LGTM (non-binding).

> SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE
> --
>
> Key: YARN-4729
> URL: https://issues.apache.org/jira/browse/YARN-4729
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.2
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4729.patch
>
>
> SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE. We 
> saw this in a unit test failure. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166498#comment-15166498
 ] 

Hadoop QA commented on YARN-4359:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 15s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 19s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 83 new + 25 unchanged - 1 fixed = 108 total (was 26) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 40 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s 
{color} | {color:red} The patch has 68 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 50s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_72
 with JDK v1.8.0_72 generated 3 new + 100 unchanged - 0 fixed = 103 total (was 
100) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 29s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95
 with JDK v1.7.0_95 generated 4 new + 2 unchanged - 0 fixed = 6 total (was 2) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 14s {color} 
| {color:red} hadoop-yarn-server-resou

[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-02-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166487#comment-15166487
 ] 

Sangjin Lee commented on YARN-3863:
---

I completed one full pass over the patch (it's large!), and I wouldn't call it 
complete yet. I may follow up with more comments as I delve more into it. I'd 
welcome others' reviews too!

Here are the comments from this review.

(TimelineEntityFilters.java)
- l.48: typo: "a entity type" -> "the entity type"
- There are multiple places where a space is missing before an opening 
parenthesis ("("). I also saw it in other files too. You want to have a space 
before the opening parenthesis.
- l.51: make it a link
- l.59: typo: "a entity type" -> "the entity type"
- l.69: typo: "a info key" -> "the info key"
- l.81: make it a link
- l.91: make it a link
- l.99: make it a link

(TimelineReaderWebServicesUtils.java)
- l.94: I'm not really sure what this change is intended to do. The goal is to 
do an equality filter against multiple values, right? Why do we need a separate 
{{parseMetricsFilters()}} method for this? What's changed?
- l.257: Why is it GREATER_OR_EQUAL instead of EQUAL?
- This is more of a question. Is a list of multiple equality filters the same 
as the multi-val equality filter? If not, how are they different?

(TimelineCompareFilter.java)
- nit: let's make the member variables final

(TimelineFilter.java)
- l.52: the name "MULTIVAL_EQUALITY" is bit confusing, and it took me a little 
bit to see this means equality with an element in the set (I thought it was 
multiple key-value equality). Is this essentially "in the set" comparison? I 
wonder if there could be a better name? The same goes for 
{{TimelineMultiValueEqualityFilter}}.

(TimelineFilterUtils.java)
- l.104: can {{createSingleColValueFiltersByRange()}} be refactored to call 
{{createHBaseSingleColValueFilter()}}?
- l.107: dead code?

(HBaseTimelineWriterImpl.java)
- Is this basically improving the code by using the strongly typed methods for 
bytes? As mentioned in a previous comment, these changes (this and 
{{\*Column\*}} changes) seem orthogonal. Would it be possible to isolate these 
changes from the main changes?
- l.448: it should simply be a {{else if}}

(TimelineStorageUtils.java)
- There are many place here and others where {{equals()}} is used to compare 
enums. All the enum comparisons should use simply "==".
- see my previous comment about refactoring to make these methods simpler and 
easier to read

(GenericEntityReader.java)
- l.260: I know this is happening deep inside the method, but it seems like a 
bit of an anti-pattern that we have to reference whether something is an 
application v. entity. There are multiple places in {{GenericEntityReader}} for 
this (basically each place where {{ApplicationColumn\*}} is used). I know there 
is already a precedent (I introduced it :(), but now it's gone full bloom. This 
makes the line between {{GenericEntityReader}} and {{ApplicationEntityReader}} 
quite blurry. Would it be possible to refactor these so that application 
behavior goes into {{ApplicationEntityReader}}? I haven't thought through what 
kind of refactoring would make that separation possible, but it would be great 
if you could come up with ideas to retain separation between 
{{GenericEntityReader}} and {{ApplicationEnttiyReader}}.
- l.532: This is an interesting point. Should we categorically disallow any 
multi-entity reads without a filter? Is it an obvious requirement? I understand 
we already set some default values (e.g. created time, etc.) so this might be a 
moot point, but do we need to check for it when some defaults are set anyway?

(TestHBaseTimelineStorage.java)
- I think we went back and forth on this, but this test is getting real long 
now. Should we consider breaking it up in some fashion? I think we originally 
broke it up as a reader test and a writer test, and then combined them into one 
again. Would there be some value in separating them (with possibly a common 
base class)? Or we could break it down along different types of entities? I'm 
open to ideas.

(TimelineExistsFilter.java)
- l.32-33: nit: make them final

(TimelineMultiValueEqualityFilter.java)
- The name is bit confusing (see above)

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.

[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166482#comment-15166482
 ] 

Hadoop QA commented on YARN-4734:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} YARN-4734 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12789821/YARN-4734.1.patch |
| JIRA Issue | YARN-4734 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10629/console |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4734:
-
Summary: Merge branch:YARN-3368 to trunk  (was: Merge YARN-3368 commit to 
trunk)

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166469#comment-15166469
 ] 

Hadoop QA commented on YARN-4723:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 3 new + 66 unchanged - 0 fixed = 69 total (was 66) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 42s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 152m 24s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12789649/YARN-4723.001.

[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-24 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166462#comment-15166462
 ] 

Arun Suresh commented on YARN-1040:
---

So that we are on the same page, If we were to separate what needs to be in 
YARN vs what Slider etc. should handle, id say :

*YARN*
* Container Upgrade primitive:
** provide AM with APIs (via NMClient) to upgrade the Container.
** API takes 1) new {{ContainerLaunchContext}} and 2) a policy viz. *In-place* 
(localize in parallel v2, start v2, stop v1) or *New+rollback* (stop v1, 
localize v2, start v2) + (start v1 if start v2 fails)  *or* list of primitive 
composable commands if the above policies doesn't cover the use case.
** should negotiate Resource increase for in-place upgrade with RM prior to 
upgrade via YARN-1197 (or perhaps use OPPORTUNISTIC containers to locally 
negotiated at the NM for the resource spike needed for upgrade, once YARN-2877 
is ready)

*Slider / or something similar*
* Application upgrade primitive
** Upgrade Orchestration Policy: Allow applications deployed via slider to 
specify order in which tasks/roles are upgraded (or started) 
** Allow applications to specify how containers of each role are upgraded
** Actually call the YARN container upgrade APIs (described above) to perform 
upgrade of each container in the user specified order/policy

Makes sense ?


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-02-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166456#comment-15166456
 ] 

Sangjin Lee commented on YARN-3863:
---

{quote}
Yes, code is similar. We are looping over a filter list and then checking the 
operator while doing processing for an individual filter.
I thought about it but then the issue in moving it into a common area is that 
the data structures which hold events, configs, metrics,etc. are not same.
We can however do one thing and that is to pass the TimelineEntity object 
itself into a common function(for all filters) and also pass something, say an 
enum indicating what kind of filter we are intending to match(name it as 
something like TimelineEntityFiltersType). Then based on this enum value, get 
the appropriate item(configs, metrics,etc.) from the passed entity. This way we 
can move common logic to a specific method which can in turn call the 
appropriate method to process based on filter type(say equality filter, 
multivalue equality filter, etc.). Does this sound fine ?
{quote}

I think so. I'll need to see the changes in code to get a better sense, though.

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4734) Merge YARN-3368 commit to trunk

2016-02-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4734:
-
Attachment: YARN-4734.1.patch

Attached patch for merge.

> Merge YARN-3368 commit to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4734) Merge YARN-3368 commit to trunk

2016-02-24 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4734:


 Summary: Merge YARN-3368 commit to trunk
 Key: YARN-4734
 URL: https://issues.apache.org/jira/browse/YARN-4734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan


YARN-2928 branch is planned to merge back to trunk soon, it depends on changes 
of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4734) Merge YARN-3368 commit to trunk

2016-02-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4734:
-
Description: YARN-2928 branch is planned to merge back to trunk shortly, it 
depends on changes of YARN-3368. This JIRA is to track the merging task.  (was: 
YARN-2928 branch is planned to merge back to trunk soon, it depends on changes 
of YARN-3368. This JIRA is to track the merging task.)

> Merge YARN-3368 commit to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4733) [YARN-3368] Commit initial web UI patch to branch: YARN-3368

2016-02-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-4733.
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: YARN-3368

This JIRA is created to track commit of initial web UI patch to branch 
YARN-3368. See 
[commit|https://github.com/apache/hadoop/commit/8ef2e8f1218a7be112ababccfde112c16ba48aa5]

> [YARN-3368] Commit initial web UI patch to branch: YARN-3368
> 
>
> Key: YARN-4733
> URL: https://issues.apache.org/jira/browse/YARN-4733
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: YARN-3368
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358

2016-02-24 Thread Ishai Menache (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166439#comment-15166439
 ] 

Ishai Menache commented on YARN-4359:
-

Will add additional tests soon.

> Update LowCost agents logic to take advantage of YARN-4358
> --
>
> Key: YARN-4359
> URL: https://issues.apache.org/jira/browse/YARN-4359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Ishai Menache
> Attachments: YARN-4359.0.patch
>
>
> Given the improvements of YARN-4358, the LowCost agent should be improved to 
> leverage this, and operate on RLESparseResourceAllocation (ideally leveraging 
> the improvements of YARN-3454 to compute avaialable resources)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358

2016-02-24 Thread Ishai Menache (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishai Menache updated YARN-4359:

Attachment: YARN-4359.0.patch

first version of the patch

> Update LowCost agents logic to take advantage of YARN-4358
> --
>
> Key: YARN-4359
> URL: https://issues.apache.org/jira/browse/YARN-4359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Ishai Menache
> Attachments: YARN-4359.0.patch
>
>
> Given the improvements of YARN-4358, the LowCost agent should be improved to 
> leverage this, and operate on RLESparseResourceAllocation (ideally leveraging 
> the improvements of YARN-3454 to compute avaialable resources)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4733) [YARN-3368] Commit initial web UI patch to branch: YARN-3368

2016-02-24 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4733:


 Summary: [YARN-3368] Commit initial web UI patch to branch: 
YARN-3368
 Key: YARN-4733
 URL: https://issues.apache.org/jira/browse/YARN-4733
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Wangda Tan
Assignee: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4733) [YARN-3368] Commit initial web UI patch to branch: YARN-3368

2016-02-24 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4733:
-
Issue Type: Sub-task  (was: Task)
Parent: YARN-3368

> [YARN-3368] Commit initial web UI patch to branch: YARN-3368
> 
>
> Key: YARN-4733
> URL: https://issues.apache.org/jira/browse/YARN-4733
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4097) Create POC timeline web UI with new YARN web UI framework

2016-02-24 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166386#comment-15166386
 ] 

Li Lu commented on YARN-4097:
-

BTW, I haven't fine-tune the styles of our current pages. More decorations will 
be very helpful. 

> Create POC timeline web UI with new YARN web UI framework
> -
>
> Key: YARN-4097
> URL: https://issues.apache.org/jira/browse/YARN-4097
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Screen Shot 2016-02-24 at 15.57.38.png, Screen Shot 
> 2016-02-24 at 15.57.53.png, Screen Shot 2016-02-24 at 15.58.08.png, Screen 
> Shot 2016-02-24 at 15.58.26.png
>
>
> As planned, we need to try out the new YARN web UI framework and implement 
> timeline v2 web UI on top of it. This JIRA proposes to build the basic active 
> flow and application lists of the timeline data. We can add more content 
> after we get used to this framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4097) Create POC timeline web UI with new YARN web UI framework

2016-02-24 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4097:

Attachment: Screen Shot 2016-02-24 at 15.58.26.png
Screen Shot 2016-02-24 at 15.58.08.png
Screen Shot 2016-02-24 at 15.57.53.png
Screen Shot 2016-02-24 at 15.57.38.png

OK here are some screenshots for the current POC we have for the ATS related 
web UI pages. Note that we're affected by YARN-4700 in the flow activity list 
(there are multiple items for the same flow due to different cluster ids). For 
flow and flowrun page now the version is very immature, but I left some 
possibility to further integrate more data from aggregations. The application 
page in the new YARN UI currently only reads data from the RM. Right now I 
simply link to the page for future integrations. 

For the near term next steps, we may want:
# Provide a "dashboard" in the flow activity page, summarizing flow activities. 
# Integrate flowrun/flow level aggregation data to flow and flowrun page. 
# Integrate ATS v2 web services to the application page, so that once an 
application finished, data can be read from the timeline server rather than the 
RM. We may want to extend this practice to attempt and container page. 


> Create POC timeline web UI with new YARN web UI framework
> -
>
> Key: YARN-4097
> URL: https://issues.apache.org/jira/browse/YARN-4097
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>  Labels: yarn-2928-1st-milestone
> Attachments: Screen Shot 2016-02-24 at 15.57.38.png, Screen Shot 
> 2016-02-24 at 15.57.53.png, Screen Shot 2016-02-24 at 15.58.08.png, Screen 
> Shot 2016-02-24 at 15.58.26.png
>
>
> As planned, we need to try out the new YARN web UI framework and implement 
> timeline v2 web UI on top of it. This JIRA proposes to build the basic active 
> flow and application lists of the timeline data. We can add more content 
> after we get used to this framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166325#comment-15166325
 ] 

Hudson commented on YARN-4697:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9364 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9364/])
YARN-4697. NM aggregation thread pool is not bound by limits (haibochen 
(rkanter: rev 954dd57043d2de4f962876c1b89753bfc7e4ce55)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java


> NM aggregation thread pool is not bound by limits
> -
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.9.0
>
> Attachments: yarn4697.001.patch, yarn4697.002.patch, 
> yarn4697.003.patch, yarn4697.004.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from 
> the nodemanager to HDFS if log aggregation is turned on. This is a cached 
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause 
> a problem on restart. The number of threads created at that point could be 
> huge and will put a large load on the NameNode and in worse case could even 
> bring it down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15165668#comment-15165668
 ] 

Bikas Saha commented on YARN-1040:
--

Agree with your scenarios. 

I am trying to figure a way by which this does not become a YARN problem (both 
initial work and ongoing maintenance). E.g. we dont know for sure that the 
resource needs to be x, 2x or 3x. This is an allocation decision and cannot be 
done without the RMs blessing. And increasing container resources is already 
work in progress and may become another NM primitive. Next, what is the 
ordering for the tasks during an upgrade? We could implement one of many 
possibilities but then be stuck with bug-fixing or improving it. Potentially 
use that as a precedent to implement yet another upgrade policy. 

Hence, my suggestion of creating composable primitives that can be used to 
easily implement these flows. And leave it to the apps to determine the exact 
upgrades paths. Perhaps Slider is a better place which could wrap different 
upgrade possibilities using the composable primitives. E.g. 
SliderStopAllUpgradePolicy or SliderConcurrentUpgradePolicy. Or they could be 
provided as helper libs in YARN/NMClient so apps dont have to compose the 
primitives from scratch. The main aim is to continue to make core YARN/NM 
simple by creating primitives and layering complexity on top. This approach may 
be simpler and incremental to develop, test and deploy. Of course, these are my 
personal design views :)

Thoughts?


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4680) TimerTasks leak in ATS V1.5 Writer

2016-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15165655#comment-15165655
 ] 

Hudson commented on YARN-4680:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9363 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9363/])
YARN-4680. TimerTasks leak in ATS V1.5 Writer. (Xuan Gong via (gtcarrera9: rev 
9e0f7b8b69ead629f999aa86c8fb7eb581e175d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/FileSystemTimelineWriter.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt


> TimerTasks leak in ATS V1.5 Writer
> --
>
> Key: YARN-4680
> URL: https://issues.apache.org/jira/browse/YARN-4680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4680.1.patch, YARN-4680.20160108.patch, 
> YARN-4680.20160109.patch, YARN-4680.20160222.patch
>
>
> We have seen TimerTasks leak which could cause application server done (such 
> as oozie server done due to too many active threads)
> Although we have fixed some potentially leak situations in upper application 
> level, such as
> https://issues.apache.org/jira/browse/MAPREDUCE-6618
> https://issues.apache.org/jira/browse/MAPREDUCE-6621, we still can not 
> guarantee that we fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-24 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15165651#comment-15165651
 ] 

Robert Kanter commented on YARN-4701:
-

LGTM +1 pending Jenkins

> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch, yarn4701.002.patch, 
> yarn4701.003.patch, yarn4701.004.patch
>
>
> Accessing logs for an task attempt in the workflow tool in Hue shows "Logs 
> not available for attempt_1433822010707_0001_m_00_0. Aggregation may not 
> be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041" 
> If the user follows that link, he/she will get "It looks like you are making 
> an HTTP request to a Hadoop IPC port. This is not the correct port for the 
> web interface on this daemon." 
> We should update the message to use the correct HTTP port. We could also make 
> it more convenient by providing the application's specific page at NM as well 
> instead of just NM's main page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163918#comment-15163918
 ] 

Ming Ma commented on YARN-4720:
---

Thanks [~hex108] for the update. The patch looks good overall. It does change 
the following behaviors.

* When {{pendingContainerInThisCycle}} is empty, NM will skip sending the 
{{LogAggregationReport}} with {{LogAggregationStatus.RUNNING}}. It means for a 
long running service, it is possible for a yarn client to get 
{{LogAggregationStatus.NOT_START}} when it calls 
{{ApplicationClientProtocol#getApplicationReport}} if the long running service 
doesn't generate any log.  Without the patch, NM will send 
{{LogAggregationStatus.RUNNING}} regardless. So it might be better to still 
send {{LogAggregationStatus.RUNNING}} regardless.

* When {{LogWriter}} creation throws exception and {{appFinished}} is true, NM 
will send a {{LogAggregationReport}} with {{LogAggregationStatus.SUCCEEDED}}. 
Without the patch, NM won't send any final {{LogAggregationReport}}. Maybe it 
is better to update the patch to send {{LogAggregationStatus.FAILED}} for such 
scenario.

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4729) SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE

2016-02-24 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163879#comment-15163879
 ] 

Robert Kanter commented on YARN-4729:
-

+1 LGTM

> SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE
> --
>
> Key: YARN-4729
> URL: https://issues.apache.org/jira/browse/YARN-4729
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.2
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4729.patch
>
>
> SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE. We 
> saw this in a unit test failure. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4730) YARN preemption based on instantaneous fair share

2016-02-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163873#comment-15163873
 ] 

Karthik Kambatla commented on YARN-4730:


IIRR, FairScheduler preemption is based on instantaneous fairshare. The steady 
fairshare is used only for WebUI purposes. 

In your case, I would think minshare preemption kicks in because you specify 
min resources for all queues. Isn't it expected that all queues are getting the 
same resources the sum of which is cluster resources? Do you expect allocations 
different from minshare? 

> YARN preemption based on instantaneous fair share
> -
>
> Key: YARN-4730
> URL: https://issues.apache.org/jira/browse/YARN-4730
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Prabhu Joseph
>
> On a big cluster with Total Cluster Resource of 10TB, 3000 cores and Fair 
> Sheduler having 230 queues and total 6 jobs run a day. [ all 230 queues 
> are very critical and hence the minResource is same for all]. On this case, 
> when a Spark Job is run on queue A and which occupies the entire cluster 
> resource and does not release any resource, another job submitted into queue 
> B and preemption is getting only the Fair Share which is <10TB , 3000> / 230 
> = <45 GB , 13 cores> which is very less fair share for a queue.shared by many 
> applications. 
> The Preemption should get the instantaneous fair Share, that is <10TB, 3000> 
> / 2 (active queues) = 5TB and 1500 cores, so that the first job won't hog the 
> entire cluster resource and also the subsequent jobs run fine.
> This issue is only when the number of queues are very high. In case of less 
> number of queues, Preemption getting Fair Share would be suffice as the fair 
> share will be high. But in case of too many number of queues, Preemption 
> should try to get the instantaneous Fair Share.
> Note: Configuring optimal maxResources to 230 queues is difficult and also 
> putting constraint for the queues using maxResource will leave  cluster 
> resource idle most of the time.
> There are 1000s of Spark Jobs, so asking each user to restrict the 
> number of executors is also difficult.
> Preempting Instantaneous Fair Share will help to overcome the above issues.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4730) YARN preemption based on instantaneous fair share

2016-02-24 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4730:
---
Component/s: fairscheduler

> YARN preemption based on instantaneous fair share
> -
>
> Key: YARN-4730
> URL: https://issues.apache.org/jira/browse/YARN-4730
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Prabhu Joseph
>
> On a big cluster with Total Cluster Resource of 10TB, 3000 cores and Fair 
> Sheduler having 230 queues and total 6 jobs run a day. [ all 230 queues 
> are very critical and hence the minResource is same for all]. On this case, 
> when a Spark Job is run on queue A and which occupies the entire cluster 
> resource and does not release any resource, another job submitted into queue 
> B and preemption is getting only the Fair Share which is <10TB , 3000> / 230 
> = <45 GB , 13 cores> which is very less fair share for a queue.shared by many 
> applications. 
> The Preemption should get the instantaneous fair Share, that is <10TB, 3000> 
> / 2 (active queues) = 5TB and 1500 cores, so that the first job won't hog the 
> entire cluster resource and also the subsequent jobs run fine.
> This issue is only when the number of queues are very high. In case of less 
> number of queues, Preemption getting Fair Share would be suffice as the fair 
> share will be high. But in case of too many number of queues, Preemption 
> should try to get the instantaneous Fair Share.
> Note: Configuring optimal maxResources to 230 queues is difficult and also 
> putting constraint for the queues using maxResource will leave  cluster 
> resource idle most of the time.
> There are 1000s of Spark Jobs, so asking each user to restrict the 
> number of executors is also difficult.
> Preempting Instantaneous Fair Share will help to overcome the above issues.
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-24 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163864#comment-15163864
 ] 

Arun Suresh commented on YARN-1040:
---

Thanks for the feedback [~bikassaha]

I understand we might not want to place artificial constraint of apps, I was 
just trying to scope out the bare min effort required specifically for long 
running container upgrades. That said, im all for going the whole hog (allow 0 
or 1+ processes) if that is maybe easier.

Some thoughts specifically with regard to container upgrade:
# If we allow multiple processes per container, we might need to have 
{{startProcess()}} to return maybe a *processId* which can subsequently be used 
by the AM to address the process in subsequent calls like {{stopProcess()}}. 
This might complicate the state of AM, and maybe we can leave it out in the 
first cut.
# w.r.t resource re-localization, as per YARN-4597, we are exploring 
localization as a service and possibly re-localization on the fly.
# I like the idea of clubbing multiple API calls in the same RPC. But should 
*upgrade* be a first class semantic, or should it be expressed as a {{localize 
v2, start v2, stop v1}} API combo. One reason to distinguish may be in the case 
of having both versions up at the same time till the new version stabilizes... 
in an upgrade case, the Container should probably be allowed to go 2x its 
allocated resource limit for a period of time, but in the case were we are just 
starting 2 processes, this should probably not be allowed.


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4729) SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE

2016-02-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163860#comment-15163860
 ] 

Karthik Kambatla commented on YARN-4729:


Test failures are not related. 

> SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE
> --
>
> Key: YARN-4729
> URL: https://issues.apache.org/jira/browse/YARN-4729
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.2
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4729.patch
>
>
> SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE. We 
> saw this in a unit test failure. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException

2016-02-24 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4723:
--
Attachment: YARN-4723.001.patch

Attaching preliminary patch based on approach#2 by [~jlowe]. Also added the 
change to not put such a node in active RMNodes map but in inactive map.

> NodesListManager$UnknownNodeId ClassCastException
> -
>
> Key: YARN-4723
> URL: https://issues.apache.org/jira/browse/YARN-4723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Jason Lowe
>Assignee: Kuhu Shukla
>Priority: Critical
> Attachments: YARN-4723.001.patch
>
>
> Saw the following in an RM log:
> {noformat}
> 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC 
> Server handler 5 on 8030, call 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff
> java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:658)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:647)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
> at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:9335)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:144)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:175)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:96)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server.call(Server.java:2267)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException

2016-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163729#comment-15163729
 ] 

Jason Lowe commented on YARN-4723:
--

I haven't looked at it in detail, but can we simply avoid doing any node update 
processing for dummy nodes (e.g.: port == -1) when processing the decommission 
transition?


> NodesListManager$UnknownNodeId ClassCastException
> -
>
> Key: YARN-4723
> URL: https://issues.apache.org/jira/browse/YARN-4723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Jason Lowe
>Assignee: Kuhu Shukla
>Priority: Critical
>
> Saw the following in an RM log:
> {noformat}
> 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC 
> Server handler 5 on 8030, call 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff
> java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:658)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:647)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
> at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:9335)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:144)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:175)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:96)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server.call(Server.java:2267)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-24 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163708#comment-15163708
 ] 

Bikas Saha commented on YARN-1040:
--

I am not sure we need to place (somewhat artificial) constraints on the app 
when its not clear that it practically affects YARN

1) Container with no process should be allowed. Apps could terminate all 
running tasks of version A, then start running tasks of version B when they are 
not backwards compatible.
2) Container should be allowed to run multiple processes. This is similar to 
the existing process spawning more processes. It is different from that in the 
sense that the NM has to add the new process to existing monitoring/cgroups etc.
3) Startprocess should be allowed with no process actually started. This will 
allow apps to localize new resources to an existing container. Alternatively, 
we could create a new localization API thats delinked from starting the 
process. But re-localization is an important related feature that we should 
look at supporting via this work because currently that does not work since its 
tied to start process.
4) Most current apps are already communicating directly with their tasks and 
hence can shut them down when they are not needed. However, like suggested 
above, it may be useful for the NM to provide a feature whereby the previous 
task can be shutdown when a new task request is received. Alternatively, the NM 
could provide a stopProcess API to make that explicit.

IMO all of this should be allowed. The timeline could be different with some 
being allowed earlier and some later based on implementation effort.

Thinking ahead, it may be useful for the NM to accept a series of API calls 
within the same RPC (with the current mechanism supported as a single command 
entity for backwards compatibility). Then we will not have to build a lot of 
logic into the NM. The app can get all features by composing a multi-command 
entity.
E.g.
Current start process = {acquire, localize, start} // where acquire means start 
container
Current shutdown process = {stop, release} // where release means give up 
container
Only localize = {localize}
Start another process = {localize, start}
Start another process after shutting down first process = {stop, start} or 
{stop, localize, start}
Start another process and then shutdown the first process = {start, stop}
New container shutdown = {release} // at this point there may be 0 or more 
processes running and which will be stopped


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4732) *ProcessTree classes have too many whitespace issues

2016-02-24 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-4732:
--

 Summary: *ProcessTree classes have too many whitespace issues
 Key: YARN-4732
 URL: https://issues.apache.org/jira/browse/YARN-4732
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Karthik Kambatla
Priority: Trivial


*ProcessTree classes have too many whitespace issues - extra newlines between 
methods, spaces in empty lines etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4732) *ProcessTree classes have too many whitespace issues

2016-02-24 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4732:
-
Assignee: Haibo Chen

> *ProcessTree classes have too many whitespace issues
> 
>
> Key: YARN-4732
> URL: https://issues.apache.org/jira/browse/YARN-4732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: Haibo Chen
>Priority: Trivial
>  Labels: newbie
>
> *ProcessTree classes have too many whitespace issues - extra newlines between 
> methods, spaces in empty lines etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters

2016-02-24 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163573#comment-15163573
 ] 

Karthik Kambatla commented on YARN-3304:


[~djp] - did we ever file follow up JIRAs to delete these deprecated methods 
from trunk? If not, I would like to file them and get the trunk changes in. 

> ResourceCalculatorProcessTree#getCpuUsagePercent default return value is 
> inconsistent with other getters
> 
>
> Key: YARN-3304
> URL: https://issues.apache.org/jira/browse/YARN-3304
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-3304-appendix-v2.patch, 
> YARN-3304-appendix-v3.patch, YARN-3304-appendix-v4.patch, 
> YARN-3304-appendix.patch, YARN-3304-v2.patch, YARN-3304-v3.patch, 
> YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, 
> YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, 
> YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, 
> YARN-3304.patch, yarn-3304-5.patch
>
>
> Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for 
> unavailable case while other resource metrics are return 0 in the same case 
> which sounds inconsistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-24 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163564#comment-15163564
 ] 

Arun Suresh commented on YARN-1040:
---

Spent some time going thru the conversation (this one as well as YARN-1404)
Given that this has been tracked as a requirement for In place application 
upgrades and it has been sometime since any activity has been posted here, 
[~bikassaha] / [~vinodkv] / [~hitesh] / [~tucu00] / [~steve_l], can you kindly 
clarify the following ?
# Are we still trying to handle the case where we have > 1 processes running 
against a container *at the same time*
# Have we decided that allowing a Container with 0 processes running is a bad 
idea ?

>From the context of getting Application upgrades working, I guess 1) can be 
>relaxed to exactly 1 process running under a container but AM has the option 
>of explicitly starting via the {{startProcess(containerLaunchContext)}} API 
>Bikas mentioned (an additional constraint could probably be the startProcess 
>has to be called within a timeout if no ContainerLaunchContext has been 
>provided with the initial {{startContainer()}} else NM will deem the container 
>dead).

In addition, I was also thinking
# If a process is already running in the container when a 
{{startProcess(ContainerLaunchContext)}} is received, then the first process is 
killed and another is started using the new {{ContainerLaunchContext}}
# Maybe we can refine the above by add an 
{{upgradeProcess(ContainerLaunchContext)}} API that can additionally take on a 
policy like:
## auto-rollback if new process does not start within a timout.
## Rollback could either mean keeping the old process running until upgraded 
process is up -or- if we want to preserve semantics of only 1 process per 
container, first kill the old process and try to start new one, and on failure 
restart old version.

If everyone is ok with the above, I volunteer to either post a preliminary 
patch for the above or if the details get dicier during investigation, I can 
put up a doc.

Thoughts ?  


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4699) Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to change label of a node

2016-02-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4699:
--
Attachment: 0001-YARN-4699.patch

As I see it, if we can update usedCapacity of the label while changing label on 
a node, this issue can be fixed. Tested various cases mentioned in the patch, 
and with this fix, it comes up correctly.

Attaching this patch for an initial review. [~leftnoteasy] thoughts?

> Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to 
> change label of a node
> 
>
> Key: YARN-4699
> URL: https://issues.apache.org/jira/browse/YARN-4699
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4699.patch, AfterAppFInish-LabelY-Metrics.png, 
> ForLabelX-AfterSwitch.png, ForLabelY-AfterSwitch.png
>
>
> Scenario is as follows:
> a. 2 nodes are available in the cluster (node1 with label "x", node2 with 
> label "y")
> b. Submit an application to node1 for label "x". 
> c. Change node1 label to "y" by using *replaceLabelsOnNode* command.
> d. Verify Scheduler UI for metrics such as "Used Capacity", "Absolute 
> Capacity" etc. "x" still shows some capacity.
> e. Change node1 label back to "x" and verify UI and REST o/p
> Output:
> 1. "Used Capacity", "Absolute Capacity" etc are not decremented once labels 
> is changed for a node.
> 2. UI tab for respective label shows wrong GREEN color in these cases.
> 3. REST o/p is wrong for each label after executing above scenario.
> Attaching screen shots also. This ticket will try to cover UI and REST o/p 
> fix when label is changed runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163441#comment-15163441
 ] 

Hadoop QA commented on YARN-4718:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 13s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 17s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 15 new + 493 unchanged - 16 fixed = 508 total (was 509) 
{color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 27s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95
 with JDK v1.7.0_95 generated 2 new + 2 unchanged - 0 fixed = 4 total (was 2) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 17s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} Patch does not generate ASF License warnings. 
{colo

[jira] [Commented] (YARN-4722) AsyncDispatcher logs redundant event queue sizes

2016-02-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163406#comment-15163406
 ] 

Jason Lowe commented on YARN-4722:
--

Thanks, Sangjin!


> AsyncDispatcher logs redundant event queue sizes
> 
>
> Key: YARN-4722
> URL: https://issues.apache.org/jira/browse/YARN-4722
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.8.0, 2.7.3, 2.9.0, 2.6.5
>
> Attachments: YARN-4722.001.patch
>
>
> A fairly common occurrence in RM logs is a string of redundant event-queue 
> logs like the following which does little except bloat the logs:
> {noformat}
> 2016-02-23 08:00:00,948 [IPC Server handler 36 on 8030] INFO 
> event.AsyncDispatcher: Size of event-queue is 1000
> 2016-02-23 08:00:00,948 [IPC Server handler 36 on 8030] INFO 
> event.AsyncDispatcher: Size of event-queue is 1000
> 2016-02-23 08:00:00,948 [IPC Server handler 36 on 8030] INFO 
> event.AsyncDispatcher: Size of event-queue is 1000
> 2016-02-23 08:00:00,948 [IPC Server handler 36 on 8030] INFO 
> event.AsyncDispatcher: Size of event-queue is 1000
> 2016-02-23 08:00:00,948 [IPC Server handler 36 on 8030] INFO 
> event.AsyncDispatcher: Size of event-queue is 1000
> [...]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4722) AsyncDispatcher logs redundant event queue sizes

2016-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163389#comment-15163389
 ] 

Hudson commented on YARN-4722:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9360 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9360/])
YARN-4722. AsyncDispatcher logs redundant event queue sizes (Jason Lowe (sjlee: 
rev 553b591ba06bbf0b18dca674d25a48218fed0a26)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java


> AsyncDispatcher logs redundant event queue sizes
> 
>
> Key: YARN-4722
> URL: https://issues.apache.org/jira/browse/YARN-4722
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-4722.001.patch
>
>
> A fairly common occurrence in RM logs is a string of redundant event-queue 
> logs like the following which does little except bloat the logs:
> {noformat}
> 2016-02-23 08:00:00,948 [IPC Server handler 36 on 8030] INFO 
> event.AsyncDispatcher: Size of event-queue is 1000
> 2016-02-23 08:00:00,948 [IPC Server handler 36 on 8030] INFO 
> event.AsyncDispatcher: Size of event-queue is 1000
> 2016-02-23 08:00:00,948 [IPC Server handler 36 on 8030] INFO 
> event.AsyncDispatcher: Size of event-queue is 1000
> 2016-02-23 08:00:00,948 [IPC Server handler 36 on 8030] INFO 
> event.AsyncDispatcher: Size of event-queue is 1000
> 2016-02-23 08:00:00,948 [IPC Server handler 36 on 8030] INFO 
> event.AsyncDispatcher: Size of event-queue is 1000
> [...]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4731) Linux container executor exception on DeleteAsUser

2016-02-24 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-4731:
--

 Summary: Linux container executor exception on DeleteAsUser
 Key: YARN-4731
 URL: https://issues.apache.org/jira/browse/YARN-4731
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt


Enable LCE and CGroups
Submit a mapreduce job

{noformat}
2016-02-24 18:56:46,889 INFO 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
absolute path : 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
2016-02-24 18:56:46,894 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 255. Privileged Execution Operation Output:
main : command provided 3
main : run as user is dsperf
main : requested yarn user is dsperf
failed to rmdir job.jar: Not a directory
Error while deleting 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
 20 (Not a directory)
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
dsperf, dsperf, 3, 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
2016-02-24 18:56:46,894 ERROR 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: DeleteAsUser 
for 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
 returned with exit code: 255
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=255:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
at 
org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 10 more

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163327#comment-15163327
 ] 

Hadoop QA commented on YARN-4720:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
2s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 patch generated 1 new + 17 unchanged - 1 fixed = 18 total (was 18) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 55s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 22s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 44s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12789595/YARN-4720.02.patch |
| JIRA Issue | YARN-4720 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux f0125aa6792a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchpr

[jira] [Commented] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163307#comment-15163307
 ] 

Hadoop QA commented on YARN-4696:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 3s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 21s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
55s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 28s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 9s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 1s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 1s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 40s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 40s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 3s 
{color} | {color:red} root: patch generated 5 new + 29 unchanged - 0 fixed = 34 
total (was 29) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 50s 
{color} | {color:red} hadoop-common-project/hadoop-common generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 23s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 57s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 44s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_72. {color} |
| 

[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException

2016-02-24 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163294#comment-15163294
 ] 

Kuhu Shukla commented on YARN-4723:
---

The primary reason for this failure is the {{UnknownNodeId}} object. Even if we 
do not put this dummy nodeId in the active RMNodes, and instead put it in 
inactiveRMNodes, the transition from NEW to DECOMMISSIONED that makes the node 
unusable(NODE_UNUSABLE) will trigger a NODE_UPDATE which instead would populate 
the {{updatedNodes}} in the AllocateResponse.
{code}
  @Override
  public void handle(NodesListManagerEvent event) {
RMNode eventNode = event.getNode();
switch (event.getType()) {
case NODE_UNUSABLE:
  LOG.debug(eventNode + " reported unusable");
  unusableRMNodesConcurrentSet.add(eventNode);
  for(RMApp app: rmContext.getRMApps().values()) {
if (!app.isAppFinalStateStored()) {
  this.rmContext
  .getDispatcher()
  .getEventHandler()
  .handle(
  new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
  RMAppNodeUpdateType.NODE_UNUSABLE));
}
  }
{code}

That being said, we should not add the node to active list, but the way to 
solve this problem is to get rid of UnknownNodeId and have an anonymous classes 
to initialize these dummy nodes.

For the unit test, I did call {{allocate}} for this scenario but that did not 
replicate the issue until I explicitly set the updatedNodes to an UnknownNodeId 
object. 

Asking [~jlowe], [~templedf] for comments and corrections.

Excerpt from a sample test :
{code}
AllocateRequest allocateRequest =
Records.newRecord(AllocateRequest.class);
AllocateResponse resp = rmClient.allocate(allocateRequest);
NodeReport report = new NodeReportPBImpl();
report.setNodeId(new NodesListManager.UnknownNodeId("host2"));
List reports = new ArrayList();
reports.add(report);
resp.setUpdatedNodes(reports);
allocateRequest =
Records.newRecord(AllocateRequest.class);
YarnServiceProtos.AllocateResponseProto p = ((AllocateResponsePBImpl) 
resp).getProto();
{code}

Proposed change in NodesListManager.java:
{code}
private void setDecomissionedNMs() {
Set excludeList = hostsReader.getExcludedHosts();
for (final String host : excludeList) {
  NodeId nodeId = makeUnknownNodeId(host);
  RMNodeImpl rmNode = new RMNodeImpl(nodeId,
  rmContext, host, -1, -1, makeUnknownNode(host), null, null);
  
rmContext.getInactiveRMNodes().putIfAbsent(rmNode.getNodeID().getHost(),rmNode);
  rmNode.handle(new RMNodeEvent(rmNode.getNodeID(), RMNodeEventType
  .DECOMMISSION));
}
  }
{code}

{code}
  Node makeUnknownNode(final String host) {
return new Node() {
  @Override
  public String getNetworkLocation() {
return null;
  }

  @Override
  public void setNetworkLocation(String location) {

  }

  @Override
  public String getName() {
return host;
  }

  @Override
  public Node getParent() {
return null;
  }

  @Override
  public void setParent(Node parent) {

  }

  @Override
  public int getLevel() {
return 0;
  }

  @Override
  public void setLevel(int i) {

  }
};
  }
{code}

> NodesListManager$UnknownNodeId ClassCastException
> -
>
> Key: YARN-4723
> URL: https://issues.apache.org/jira/browse/YARN-4723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Jason Lowe
>Assignee: Kuhu Shukla
>Priority: Critical
>
> Saw the following in an RM log:
> {noformat}
> 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC 
> Server handler 5 on 8030, call 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff
> java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.n

[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163282#comment-15163282
 ] 

Jun Gong commented on YARN-4720:


Thanks [~mingma] for the review. Attach a new patch to address above problems.

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-24 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4720:
---
Attachment: YARN-4720.02.patch

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163273#comment-15163273
 ] 

Hadoop QA commented on YARN-4696:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 25s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
1s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 32s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 3 new + 
29 unchanged - 0 fixed = 32 total (was 29) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 24s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 54s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 45s 
{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch 
passed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 9s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v

[jira] [Commented] (YARN-4705) ATS 1.5 parse pipeline to consider handling open() events recoverably

2016-02-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163232#comment-15163232
 ] 

Steve Loughran commented on YARN-4705:
--

All we need to know is "does flush() write data back so that other code can 
eventually see it?"

> ATS 1.5 parse pipeline to consider handling open() events recoverably
> -
>
> Key: YARN-4705
> URL: https://issues.apache.org/jira/browse/YARN-4705
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> During one of my own timeline test runs, I've been seeing a stack trace 
> warning that the CRC check failed in Filesystem.open() file; something the FS 
> was ignoring.
> Even though its swallowed (and probably not the cause of my test failure), 
> looking at the code in {{LogInfo.parsePath()}} that it considers a failure to 
> open a file as unrecoverable. 
> on some filesystems, this may not be the case, i.e. if its open for writing 
> it may not be available for reading; checksums maybe a similar issue. 
> Perhaps a failure at open() should be viewed as recoverable while the app is 
> still running?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4705) ATS 1.5 parse pipeline to consider handling open() events recoverably

2016-02-24 Thread jay vyas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163222#comment-15163222
 ] 

jay vyas commented on YARN-4705:


Ah the GlusterFS consistency model ? From my experience its not strongly 
consistent all the time in all cases.  I'd cc [~chenh] and @childsb as well on 
this ... they are currently working on these filesystems.

> ATS 1.5 parse pipeline to consider handling open() events recoverably
> -
>
> Key: YARN-4705
> URL: https://issues.apache.org/jira/browse/YARN-4705
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> During one of my own timeline test runs, I've been seeing a stack trace 
> warning that the CRC check failed in Filesystem.open() file; something the FS 
> was ignoring.
> Even though its swallowed (and probably not the cause of my test failure), 
> looking at the code in {{LogInfo.parsePath()}} that it considers a failure to 
> open a file as unrecoverable. 
> on some filesystems, this may not be the case, i.e. if its open for writing 
> it may not be available for reading; checksums maybe a similar issue. 
> Perhaps a failure at open() should be viewed as recoverable while the app is 
> still running?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-556) [Umbrella] RM Restart phase 2 - Work preserving restart

2016-02-24 Thread Johannes Zillmann (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163215#comment-15163215
 ] 

Johannes Zillmann commented on YARN-556:


Mmh, have a test-cluster where the resource-manager fails to start after a 
crash. No matter if only the resource manager is started or the whole YARN, we 
always getting following exception:
{quote}
2016-02-24 15:37:22,474 INFO  attempt.RMAppAttemptImpl 
(RMAppAttemptImpl.java:recover(796)) - Recovering attempt: 
appattempt_1456252782760_0018_01 with final state: null
2016-02-24 15:37:22,474 INFO  security.AMRMTokenSecretManager 
(AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken for 
ApplicationAttempt: appattempt_1456252782760_0018_01
2016-02-24 15:37:22,474 INFO  security.AMRMTokenSecretManager 
(AMRMTokenSecretManager.java:createPassword(307)) - Creating password for 
appattempt_1456252782760_0018_01
2016-02-24 15:37:22,474 INFO  resourcemanager.ApplicationMasterService 
(ApplicationMasterService.java:registerAppAttempt(670)) - Registering app 
attempt : appattempt_1456252782760_0018_01
2016-02-24 15:37:22,475 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:serviceStart(594)) - Failed to load/recover state
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(Res

[jira] [Commented] (YARN-4705) ATS 1.5 parse pipeline to consider handling open() events recoverably

2016-02-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163205#comment-15163205
 ] 

Steve Loughran commented on YARN-4705:
--

YARN-4696 contains my current logic to handle failures to parse things. :

If the JSON parser fails then an info message is printed if we know the file is 
non-empty (i.e. either length>0 or offset > 0)

I think there are some possible race conditions in the code as is, certainly 
FNFEs ought to downgrade to info, 

For other IOEs, I think they should be caught & logged per file, rather than 
stop the entire scan loop. Otherwise bad permissions on one file would be 
enough to break the scanning.


Regarding trying to work with Raw vs HDFS...I've not been able to get at raw, 
am trying to disable caching in file://, but am close to accepting defeat and 
spinning up a single mini yarn cluster across all my test cases. That or add a 
config option to turn off checksumming in localFS. The logic is there, but you 
can only set it in an FS instance which must be used directly or propagated to 
the code-under-test via the FS cache.

The local FS does work for picking up completed work; the problem is that as 
flush() doesn't, it doesn't reliably read the updates of incomplete jobs. And 
when it does, unless the JSON is aligned on a buffer boundary, the parser is 
going to fail, which is going to lead to lots and lots of info messages, unless 
the logging is tuned further to only log if the last operation was not a 
failure.

We only need to really worry about other cross-cluster filesystems for 
production use here. Single node with local FS? Use the 1.0 APIs. Production: 
Distributed FS which is required to implement flush() (even a delayed/async 
flush) if you want to see incomplete applications. I believe GlusterFS supports 
that, as does any POSIX FS if the checksum FS doesn't get in the way. What does 
[~jayunit100] have to say about his filesystem's consistency model? 

It will mean that the object stores, S3 and swift can't work as destinations 
for logs. They are dangerous anyway as if the app crashes before 
{{out.close()}} is called *all* data is lost. If we care about that, then you'd 
really want to write to an FS (local or HDFS) then copy to the blobstore for 
long-term histories.
 

> ATS 1.5 parse pipeline to consider handling open() events recoverably
> -
>
> Key: YARN-4705
> URL: https://issues.apache.org/jira/browse/YARN-4705
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
>
> During one of my own timeline test runs, I've been seeing a stack trace 
> warning that the CRC check failed in Filesystem.open() file; something the FS 
> was ignoring.
> Even though its swallowed (and probably not the cause of my test failure), 
> looking at the code in {{LogInfo.parsePath()}} that it considers a failure to 
> open a file as unrecoverable. 
> on some filesystems, this may not be the case, i.e. if its open for writing 
> it may not be available for reading; checksums maybe a similar issue. 
> Perhaps a failure at open() should be viewed as recoverable while the app is 
> still running?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4680) TimerTasks leak in ATS V1.5 Writer

2016-02-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163192#comment-15163192
 ] 

Junping Du commented on YARN-4680:
--

+1 on latest patch. [~gtCarrera9], please go ahead to commit this patch to 
trunk, branch-2 and branch-2.8.

> TimerTasks leak in ATS V1.5 Writer
> --
>
> Key: YARN-4680
> URL: https://issues.apache.org/jira/browse/YARN-4680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4680.1.patch, YARN-4680.20160108.patch, 
> YARN-4680.20160109.patch, YARN-4680.20160222.patch
>
>
> We have seen TimerTasks leak which could cause application server done (such 
> as oozie server done due to too many active threads)
> Although we have fixed some potentially leak situations in upper application 
> level, such as
> https://issues.apache.org/jira/browse/MAPREDUCE-6618
> https://issues.apache.org/jira/browse/MAPREDUCE-6621, we still can not 
> guarantee that we fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4680) TimerTasks leak in ATS V1.5 Writer

2016-02-24 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4680:
-
Target Version/s: 2.8.0

> TimerTasks leak in ATS V1.5 Writer
> --
>
> Key: YARN-4680
> URL: https://issues.apache.org/jira/browse/YARN-4680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4680.1.patch, YARN-4680.20160108.patch, 
> YARN-4680.20160109.patch, YARN-4680.20160222.patch
>
>
> We have seen TimerTasks leak which could cause application server done (such 
> as oozie server done due to too many active threads)
> Although we have fixed some potentially leak situations in upper application 
> level, such as
> https://issues.apache.org/jira/browse/MAPREDUCE-6618
> https://issues.apache.org/jira/browse/MAPREDUCE-6621, we still can not 
> guarantee that we fixed the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-02-24 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4696:
-
Attachment: YARN-4696-008.patch

Patch -008. This removes a subclass of RawLocalFileSystem that I'd been trying 
to instantiate directly. That doesn't work...I won't go  into the details.

Note also that patch -007

# has the code to remember the cache option before the 
{{FileSystemTimelineWriter}} gets a file, and restores it after
# has commented out the entire action of disabling the cache.

Why #2? It's to try to get a local FS with checksumming disabled picked up in 
test cases. I've not got that working. 
Why #1? Because some other part of the JVM may want caching, and so they won't 
want this class disabling it for them.

I'm assuming that the caching was disabled to ensure that if this class closed 
the fs instance then the solution there is: don't close the FS when the service 
is stopped. We can rely on Hadoop itself to stop all filesystems in JVM 
shutdown. Of course, if the concern is that its other bits of code closing the 
FS, that's harder. In such a case, if I do manage to get my local FS test 
working, then we may need a test-time option to not-disable the cache

> EntityGroupFSTimelineStore to work in the absence of an RM
> --
>
> Key: YARN-4696
> URL: https://issues.apache.org/jira/browse/YARN-4696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-4696-001.patch, YARN-4696-002.patch, 
> YARN-4696-003.patch, YARN-4696-005.patch, YARN-4696-006.patch, 
> YARN-4696-007.patch, YARN-4696-008.patch
>
>
> {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the 
> configuration pointing to it. This is a new change, and impacts testing where 
> you have historically been able to test without an RM running.
> The sole purpose of the probe is to automatically determine if an app is 
> running; it falls back to "unknown" if not. If the RM connection was 
> optional, the "unknown" codepath could be called directly, relying on age of 
> file as a metric of completion
> Options
> # add a flag to disable RM connect
> # skip automatically if RM not defined/set to 0.0.0.0
> # disable retries on yarn client IPC; if it fails, tag app as unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-02-24 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4696:
-
Attachment: YARN-4696-007.patch

Patch 007

files that are stat-ed as empty are not skipped, but no attempt is made to log 
a parse problem if the length is 0 and no data has ever been read from it 
before (i.e. offset=0). 

> EntityGroupFSTimelineStore to work in the absence of an RM
> --
>
> Key: YARN-4696
> URL: https://issues.apache.org/jira/browse/YARN-4696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-4696-001.patch, YARN-4696-002.patch, 
> YARN-4696-003.patch, YARN-4696-005.patch, YARN-4696-006.patch, 
> YARN-4696-007.patch
>
>
> {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the 
> configuration pointing to it. This is a new change, and impacts testing where 
> you have historically been able to test without an RM running.
> The sole purpose of the probe is to automatically determine if an app is 
> running; it falls back to "unknown" if not. If the RM connection was 
> optional, the "unknown" codepath could be called directly, relying on age of 
> file as a metric of completion
> Options
> # add a flag to disable RM connect
> # skip automatically if RM not defined/set to 0.0.0.0
> # disable retries on yarn client IPC; if it fails, tag app as unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)

2016-02-24 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163144#comment-15163144
 ] 

Akira AJISAKA commented on YARN-4630:
-

Found unnecessarily call of {{ApplicationAttemptId#compareTo}}.
{code:title=ContainerId.java}
  public int compareTo(ContainerId other) {
if (this.getApplicationAttemptId().compareTo(
other.getApplicationAttemptId()) == 0) {
  return Long.compare(getContainerId(), other.getContainerId());
} else {
  return this.getApplicationAttemptId().compareTo(
  other.getApplicationAttemptId());
}
  }
{code}
Hi [~sarutak], would you keep the value of 
{{this.getApplicationAttemptId().compareTo(other.getApplicationAttemptId())}} 
and reuse it as follows?
{code}
  public int compareTo(ContainerId other) {
int result = this.getApplicationAttemptId().compareTo(
other.getApplicationAttemptId());
if (result == 0) {
  return Long.compare(getContainerId(), other.getContainerId());
} else {
  return result;
}
  }
{code}

> Remove useless boxing/unboxing code (Hadoop YARN)
> -
>
> Key: YARN-4630
> URL: https://issues.apache.org/jira/browse/YARN-4630
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Priority: Minor
> Attachments: YARN-4630.0.patch
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)

2016-02-24 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163125#comment-15163125
 ] 

Akira AJISAKA commented on YARN-4630:
-

bq. can I check this since it seems to include the changes against ContainerId?
Okay.
{code:title=ContainerId.java}
   public int compareTo(ContainerId other) {
 if (this.getApplicationAttemptId().compareTo(
 other.getApplicationAttemptId()) == 0) {
-  return Long.valueOf(getContainerId())
-  .compareTo(Long.valueOf(other.getContainerId()));
+  return Long.compare(getContainerId(), other.getContainerId());
{code}
IMO, the change is safe since the following change is only to remove 
unnecessarily boxing/unboxing.

> Remove useless boxing/unboxing code (Hadoop YARN)
> -
>
> Key: YARN-4630
> URL: https://issues.apache.org/jira/browse/YARN-4630
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Priority: Minor
> Attachments: YARN-4630.0.patch
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)

2016-02-24 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163046#comment-15163046
 ] 

Tsuyoshi Ozawa commented on YARN-4630:
--

Hey Akira, can I check this since it seems to include the changes 
againstContainerId? It has an impact against RM-HA.

> Remove useless boxing/unboxing code (Hadoop YARN)
> -
>
> Key: YARN-4630
> URL: https://issues.apache.org/jira/browse/YARN-4630
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Priority: Minor
> Attachments: YARN-4630.0.patch
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)

2016-02-24 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163046#comment-15163046
 ] 

Tsuyoshi Ozawa edited comment on YARN-4630 at 2/24/16 2:15 PM:
---

Hey Akira, can I check this since it seems to include the changes against 
ContainerId? It has an impact against RM-HA.


was (Author: ozawa):
Hey Akira, can I check this since it seems to include the changes 
againstContainerId? It has an impact against RM-HA.

> Remove useless boxing/unboxing code (Hadoop YARN)
> -
>
> Key: YARN-4630
> URL: https://issues.apache.org/jira/browse/YARN-4630
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Priority: Minor
> Attachments: YARN-4630.0.patch
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-02-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162920#comment-15162920
 ] 

Varun Saxena commented on YARN-3863:


Thanks [~sjlee0] for the comments.

bq. If I'm reading this right, the key changes seem to be in 
TimelineStorageUtils
The changes in TimelineStorageUtils would primarily be used by FS 
implementation. Because in FS Impl, filters will be applied locally.
The major change from a HBase implementation perspective is xxxEntityReader 
classes where we are creating a filter list based on filters.
However for relation filters and event filters, we cannot create a HBase filter 
to filter out rows because of the way relations and events are stored. So the 
logic for relations and filters is to fetch only the required columns(as 
required by the filters) if those fields are not to be retrieved.
I am basically trying to trim down data brought over from backend.
For relations and events, filters are then applied locally(even for HBase 
storage implementation). For other filters, in HBase implementation, we no 
longer apply filters locally and its all handled through HBase filters.
Sorry for missing out on adding detailed comments in TimelineStorageUtils. I 
agree code can be refactored there to make it more readable.

bq. Also, these methods seem to have similar code. Any possibility of 
refactoring the common logic?
Yes, code is similar. We are looping over a filter list and then checking the 
operator while doing processing for an individual filter.
I thought about it but then the issue in moving it into a common area is that 
the data structures which hold events, configs, metrics,etc. are not same. 

We can however do one thing and that is to pass the TimelineEntity object 
itself into a common function(for all filters) and also pass something, say an 
enum indicating what kind of filter we are intending to match(name it as 
something like TimelineEntityFiltersType). Then based on this enum value, get 
the appropriate item(configs, metrics,etc.) from the passed entity. This way we 
can move common logic to a specific method which can in turn call the 
appropriate method to process based on filter type(say equality filter, 
multivalue equality filter, etc.). Does this sound fine ?

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4333) Fair scheduler should support preemption within queue

2016-02-24 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162904#comment-15162904
 ] 

Ashwin Shankar commented on YARN-4333:
--

I'm out of town until end of this week. Will take a look when I get back. 
Thanks!

> Fair scheduler should support preemption within queue
> -
>
> Key: YARN-4333
> URL: https://issues.apache.org/jira/browse/YARN-4333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Tao Jie
>Assignee: Tao Jie
> Attachments: YARN-4333.001.patch, YARN-4333.002.patch, 
> YARN-4333.003.patch
>
>
> Now each app in fair scheduler is allocated its fairshare, however  fairshare 
> resource is not ensured even if fairSharePreemption is enabled.
> Consider: 
> 1, When the cluster is idle, we submit app1 to queueA,which takes maxResource 
> of queueA.  
> 2, Then the cluster becomes busy, but app1 does not release any resource, 
> queueA resource usage is over its fairshare
> 3, Then we submit app2(maybe with higher priority) to queueA. Now app2 has 
> its own fairshare, but could not obtain any resource, since queueA is still 
> over its fairshare and resource will not assign to queueA anymore. Also, 
> preemption is not triggered in this case.
> So we should allow preemption within queue, when app is starved for fairshare.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162897#comment-15162897
 ] 

Hadoop QA commented on YARN-4630:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 12s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 36s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 3s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 0s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 3s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 21s 
{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 22s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 40s {color} 
| {color:red} hadoop-yar

[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart

2016-02-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162794#comment-15162794
 ] 

Junping Du commented on YARN-1489:
--

bq. That and the "Old running containers don't know where the new AM is 
running." issue is big enough that we shouldn't close this umbrella as done.
I don't think we have an open JIRA under this umbrella to track this issue. Is 
this a specific issue for MR (like we discussed on MAPREDUCE-6608) or a generic 
issue for other frameworks (Spark, etc.) too? YARN-4602 get filed to track this 
issue as a generic problem for messages pass between containers.

> [Umbrella] Work-preserving ApplicationMaster restart
> 
>
> Key: YARN-1489
> URL: https://issues.apache.org/jira/browse/YARN-1489
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Work preserving AM restart.pdf
>
>
> Today if AMs go down,
>  - RM kills all the containers of that ApplicationAttempt
>  - New ApplicationAttempt doesn't know where the previous containers are 
> running
>  - Old running containers don't know where the new AM is running.
> We need to fix this to enable work-preserving AM restart. The later two 
> potentially can be done at the app level, but it is good to have a common 
> solution for all apps where-ever possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4729) SchedulerApplicationAttempt#getTotalRequiredResources can throw an NPE

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162702#comment-15162702
 ] 

Hadoop QA commented on YARN-4729:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
47s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 11s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 36s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 150m 19s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12789374/yarn-4729.patch 

[jira] [Commented] (YARN-4634) Scheduler UI/Metrics need to consider cases like non-queue label mappings

2016-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15162699#comment-15162699
 ] 

Hadoop QA commented on YARN-4634:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 18s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 2 new + 200 unchanged - 0 fixed = 202 total (was 200) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 0s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 11s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 149m 24s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
|