[jira] [Commented] (YARN-4117) End to end unit test with mini YARN cluster for AMRMProxy Service

2016-02-25 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168565#comment-15168565
 ] 

Jian He commented on YARN-4117:
---

[~giovanni.fumarola], thanks for working on this, a few comments and questions :
1. could you add comments at the header of each test to explain what the test 
is doing so that it can be understood more easily?
2. this test is based on timing, can we check that 
AllocateResponse#getAMRMToken actually returns a different token than previous 
one ?
{code}
  for (int i = 0; i < 5; i++) {
client.allocate(request);
request.setResponseId(request.getResponseId() + 1);

// Time slot to be sure the RM renew the token
Thread.sleep(1500);
  }
{code}
3. why are the changes in MiniYarnCluster needed ?
4. what is the purpose of testE2ETokenSwap?

> End to end unit test with mini YARN cluster for AMRMProxy Service
> -
>
> Key: YARN-4117
> URL: https://issues.apache.org/jira/browse/YARN-4117
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Kishore Chaliparambil
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-4117.v0.patch
>
>
> YARN-2884 introduces a proxy between AM and RM. This JIRA proposes an end to 
> end unit test using mini YARN cluster to the AMRMProxy service. This test 
> will validate register, allocate and finish application and token renewal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4579) Allow DefaultContainerExecutor container log directory permissions to be configurable

2016-02-25 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168562#comment-15168562
 ] 

Ray Chiang commented on YARN-4579:
--

Thanks for the review and the commit!

> Allow DefaultContainerExecutor container log directory permissions to be 
> configurable
> -
>
> Key: YARN-4579
> URL: https://issues.apache.org/jira/browse/YARN-4579
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: YARN-4579.001.patch, YARN-4579.002.patch, 
> YARN-4579.003.patch, YARN-4579.004.patch, YARN-4579.005.patch, 
> YARN-4579.006.patch
>
>
> By default, container directory permissions are hardcoded to this member in 
> DefaultContainerExecutor:
>   static final short LOGDIR_PERM = (short)0710;
> There are some cases where less restrictive permissions are desired.  Make 
> this configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011

2016-02-25 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168548#comment-15168548
 ] 

Inigo Goiri commented on YARN-4718:
---

The latest errors seem consistent across runs.
It seems related to the tokens, any ideas?

> Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
> 
>
> Key: YARN-4718
> URL: https://issues.apache.org/jira/browse/YARN-4718
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-4718-v000.patch, YARN-4718-v001.patch, 
> YARN-4718-v002.patch, YARN-4718-v003.patch, YARN-4718-v004.patch
>
>
> As discussed in YARN-1011, we are planning to add a few more variables 
> regarding utilization to SchedulerNode. To ensure the new variables are 
> descriptive and don't lead to confusion, the existing variables could use 
> some renaming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168544#comment-15168544
 ] 

Hadoop QA commented on YARN-4718:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 14 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
31s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 0 new + 366 unchanged - 11 fixed = 366 total (was 377) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 20s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 29s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 150m 13s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 

[jira] [Updated] (YARN-4711) NM is going down with NPE's due to single thread processing of events by Timeline client

2016-02-25 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4711:

Description: 
After YARN-3367, while testing the latest 2928 branch came across few NPEs due 
to which NM is shutting down.
{code}
2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
{code}

{code}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
{code}

On analysis found that the there was delay in processing of events, as after 
YARN-3367 all the events were getting processed by a single thread inside the 
timeline client. 

Additionally found one scenario where there is possibility of NPE:
* TimelineEntity.toString() when {{real}} is not null


  was:
While testing the latest 2928 branch came across a NPE which is shutting down 
the NM
{code}
2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
{code}
Seems to be a race condition ...

Few other NPE's faced :
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
{code}

* Also there is possibility of NPE in TimelineEntity.toString() when real is 
not null



> NM is going down with NPE's due to single thread processing of events by 
> Timeline client
> 
>
> Key: YARN-4711
> URL: https://issues.apache.org/jira/browse/YARN-4711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
>
> After YARN-3367, while testing the latest 2928 branch came across few NPEs 
> due to which NM is shutting down.
> {code}
> 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> 

[jira] [Updated] (YARN-4711) NM is going down with NPE's due to single thread processing of events by Timeline client

2016-02-25 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4711:

Summary: NM is going down with NPE's due to single thread processing of 
events by Timeline client  (was: NPE in NMTimelinePublisher)

> NM is going down with NPE's due to single thread processing of events by 
> Timeline client
> 
>
> Key: YARN-4711
> URL: https://issues.apache.org/jira/browse/YARN-4711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
>
> While testing the latest 2928 branch came across a NPE which is shutting down 
> the NM
> {code}
> 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Seems to be a race condition ...
> Few other NPE's faced :
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> * Also there is possibility of NPE in TimelineEntity.toString() when real is 
> not null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-02-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168524#comment-15168524
 ] 

Varun Saxena commented on YARN-4712:


[~Naganarasimha],
We had decided in YARN-4053 that we wont support doubles/floats as of now 
because of the complexities involved due to its implementation.
Moreover, we found that most of the times floats would come for some metric 
which is represented as a percentage(%) or a ratio.
In this case the denominator is fixed so you can easily normalize the value and 
store it as an integer. And divide it by 100(if its a %) when you read it back.
Moreover, in case of percentages a precision higher than 2 decimal places, wont 
be mighty important.

We thought if user/client wanted to store a value/ratio where denominator is 
not fixed, he can merely store numerator and denominator separately and do the 
necessary computation after reading it back.

We had decided until and unless there was a strong use case, we will go with 
only longs for now.
And we could not think of any use cases for current metrics as of now.
And there is a way around it as well if somebody wants to store a float by 
normalizing the value and storing numerator and denominator separately.

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168517#comment-15168517
 ] 

Hadoop QA commented on YARN-4734:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 1s 
{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 45s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 2s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 9m 56s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 5m 1s 
{color} | {color:red} hadoop-yarn in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 27s 
{color} | {color:red} hadoop-yarn-ui in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 7m 40s 
{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 4m 34s 
{color} | {color:red} root in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 34s {color} 
| {color:red} root in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 4m 45s 
{color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 4m 45s {color} 
| {color:red} root in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 35s 
{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvneclipse {color} | {color:red} 0m 45s 
{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} shellcheck {color} | {color:red} 0m 13s 
{color} | {color:red} The applied patch generated 669 new + 98 unchanged - 0 
fixed = 767 total (was 98) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 50 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s 
{color} | {color:red} The patch has 242 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 2s {color} | 
{color:red} The patch has 1 ill-formed XML file(s). {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 33s 
{color} | {color:red} root in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 5s 
{color} | {color:red} root in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 8s {color} 
| {color:red} root in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 11m 37s {color} 
| {color:red} root in the patch failed with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 39s 
{color} | {color:red} Patch 

[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168515#comment-15168515
 ] 

Varun Saxena commented on YARN-4700:


[~Naganarasimha], I remember there was a JIRA with you regarding duplicate 
events being sent from RM(issue is not specific to this branch). What happened 
to that ?

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4740) container complete msg may lost while AM restart in race condition

2016-02-25 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-4740:
---
Attachment: YARN-4740.01.patch

put containers in finishedContainersSentToAM back to justFinishedContainer if  
AM restart.  

> container complete msg may lost while AM restart in race condition
> --
>
> Key: YARN-4740
> URL: https://issues.apache.org/jira/browse/YARN-4740
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4740.01.patch
>
>
> 1, container completed, and the msg is store in 
> RMAppAttempt.justFinishedContainers
> 2,  AM allocate and before allocateResponse came to AM, AM crashed
> 3,  AM restart and couldn't get the container complete msg.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4740) container complete msg may lost while AM restart in race condition

2016-02-25 Thread sandflee (JIRA)
sandflee created YARN-4740:
--

 Summary: container complete msg may lost while AM restart in race 
condition
 Key: YARN-4740
 URL: https://issues.apache.org/jira/browse/YARN-4740
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: sandflee
Assignee: sandflee


1, container completed, and the msg is store in 
RMAppAttempt.justFinishedContainers
2,  AM allocate and before allocateResponse came to AM, AM crashed
3,  AM restart and couldn't get the container complete msg.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168443#comment-15168443
 ] 

Naganarasimha G R commented on YARN-4700:
-

Oops My mistake, i dint chk the default values set in the code just went by the 
discussion we had. Yes we resend the started and stopped events on recovery but 
with the same set of data, will try to verify whats making it enter as a new 
row with diff values.

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168420#comment-15168420
 ] 

Hadoop QA commented on YARN-4359:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 15 new + 27 unchanged - 1 fixed = 42 total (was 28) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
24s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 21s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 44s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95
 with JDK v1.7.0_95 generated 2 new + 2 unchanged - 0 fixed = 4 total (was 2) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 39s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 44s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 152m 18s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests 

[jira] [Commented] (YARN-4671) There is no need to acquire CS lock when completing a container

2016-02-25 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168390#comment-15168390
 ] 

Jian He commented on YARN-4671:
---

[~mding], the patch does not apply now.. would you mind updating ?

> There is no need to acquire CS lock when completing a container
> ---
>
> Key: YARN-4671
> URL: https://issues.apache.org/jira/browse/YARN-4671
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4671.1.patch
>
>
> In YARN-4519, we discovered that there is no need to acquire CS lock in 
> CS#completedContainerInternal, because:
> * Access to critical section are already guarded by queue lock.
> * It is not essential to guard {{schedulerHealth}} with cs lock in 
> completedContainerInternal. All maps in schedulerHealth are concurrent maps. 
> Even if schedulerHealth is not consistent at the moment, it will be 
> eventually consistent.
> With this fix, we can truly claim that CS#allocate doesn't require CS lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4739) TestCase Failure : TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization

2016-02-25 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168386#comment-15168386
 ] 

Brahma Reddy Battula commented on YARN-4739:


Yes, It is dupe of YARN-4566.. Thanks.

> TestCase Failure : 
> TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization 
> 
>
> Key: YARN-4739
> URL: https://issues.apache.org/jira/browse/YARN-4739
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>
> {noformat}
> 1 tests failed.
> FAILED:  
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization
> Error Message:
> null
> Stack Trace:
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.verifySimulatedUtilization(TestMiniYarnClusterNodeUtilization.java:217)
>   at 
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:116)
> {noformat}
> Ref :
> https://builds.apache.org/job/Hadoop-Yarn-trunk/1820/
> https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1109/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4739) TestCase Failure : TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization

2016-02-25 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved YARN-4739.

Resolution: Duplicate

> TestCase Failure : 
> TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization 
> 
>
> Key: YARN-4739
> URL: https://issues.apache.org/jira/browse/YARN-4739
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>
> {noformat}
> 1 tests failed.
> FAILED:  
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization
> Error Message:
> null
> Stack Trace:
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.verifySimulatedUtilization(TestMiniYarnClusterNodeUtilization.java:217)
>   at 
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:116)
> {noformat}
> Ref :
> https://builds.apache.org/job/Hadoop-Yarn-trunk/1820/
> https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1109/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168379#comment-15168379
 ] 

Hadoop QA commented on YARN-4720:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 patch generated 1 new + 18 unchanged - 1 fixed = 19 total (was 19) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 2s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 30s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 33m 8s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790051/YARN-4720.05.patch |
| JIRA Issue | YARN-4720 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b196f627fb17 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (YARN-4739) TestCase Failure : TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization

2016-02-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168371#comment-15168371
 ] 

Rohith Sharma K S commented on YARN-4739:
-

Is it dupe of YARN-4566?

> TestCase Failure : 
> TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization 
> 
>
> Key: YARN-4739
> URL: https://issues.apache.org/jira/browse/YARN-4739
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>
> {noformat}
> 1 tests failed.
> FAILED:  
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization
> Error Message:
> null
> Stack Trace:
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.verifySimulatedUtilization(TestMiniYarnClusterNodeUtilization.java:217)
>   at 
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:116)
> {noformat}
> Ref :
> https://builds.apache.org/job/Hadoop-Yarn-trunk/1820/
> https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1109/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4739) TestCase Failure : TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization

2016-02-25 Thread Brahma Reddy Battula (JIRA)
Brahma Reddy Battula created YARN-4739:
--

 Summary: TestCase Failure : 
TestMiniYarnClusterNodeUtilization#testUpdateNodeUtilization 
 Key: YARN-4739
 URL: https://issues.apache.org/jira/browse/YARN-4739
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula


{noformat}
1 tests failed.
FAILED:  
org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization

Error Message:
null

Stack Trace:
java.lang.NullPointerException: null
at 
org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.verifySimulatedUtilization(TestMiniYarnClusterNodeUtilization.java:217)
at 
org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:116)
{noformat}

Ref :

https://builds.apache.org/job/Hadoop-Yarn-trunk/1820/
https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/1109/




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168339#comment-15168339
 ] 

Hadoop QA commented on YARN-4686:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 3s {color} | 
{color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.8.0_72. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 15s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_72. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 6s {color} | 
{color:red} hadoop-yarn-server-tests in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 8s {color} 
| {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 167m 2s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 

[jira] [Updated] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011

2016-02-25 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-4718:
--
Attachment: YARN-4718-v004.patch

Removing all changes in the user side.

> Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
> 
>
> Key: YARN-4718
> URL: https://issues.apache.org/jira/browse/YARN-4718
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-4718-v000.patch, YARN-4718-v001.patch, 
> YARN-4718-v002.patch, YARN-4718-v003.patch, YARN-4718-v004.patch
>
>
> As discussed in YARN-1011, we are planning to add a few more variables 
> regarding utilization to SchedulerNode. To ensure the new variables are 
> descriptive and don't lead to confusion, the existing variables could use 
> some renaming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-02-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168285#comment-15168285
 ] 

Rohith Sharma K S commented on YARN-4624:
-

Triggered jenkins manually

> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, 
> YARN-4624-003.patch, YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-02-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168281#comment-15168281
 ] 

Rohith Sharma K S commented on YARN-4624:
-

+1 LGTM, pending jenkins

> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, 
> YARN-4624-003.patch, YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-25 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168274#comment-15168274
 ] 

Jun Gong commented on YARN-4720:


Thanks [~mingma] for suggestions. I attached a wrong version 
patch(YARN-4720.04.patch)...

The new patch fixed the problem and added a new test.

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch, 
> YARN-4720.03.patch, YARN-4720.04.patch, YARN-4720.05.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-25 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4720:
---
Attachment: YARN-4720.05.patch

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch, 
> YARN-4720.03.patch, YARN-4720.04.patch, YARN-4720.05.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4734:
-
Attachment: YARN-4734.3.patch

Attached a WIP ver.3 patch, integrated build to existing Hadoop maven build.
Some items still pending, for example, add manual steps for user to load 
builded war package and run in Tomcat.

bq. I don't think hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui the correct 
place given this appears to be a strictly server-side component. It should be 
moved to be under hadoop-yarn-server minimally.
Since this is JS library running in client side, I would suggest to keep it 
here, it could be used to create client side tool, for example, we can create a 
tool using the framework running on client node to analysis application logs in 
local machine.

Will address rest comments in following patches.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-02-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168241#comment-15168241
 ] 

Wangda Tan commented on YARN-4465:
--

Thanks [~bibinchundatt],

Generally looks good, one minor comment:
Could you avoid show all cluster labels in exception when label doesn't exist 
in cluster?
{code}
337   if (nlm != null && !nlm.containsNodeLabel(labelExpression)) {
338 throw new InvalidLabelResourceRequestException(
339 "Invalid label resource request, cluster do not contain "
340 + ", label= " + labelExpression + " Cluster labels= "
341 + StringUtils.join(nlm.getClusterNodeLabels(), ','));
315   }
342   }
{code}

A cluster could have many labels.

And could you explain why change of TestKillApplicationWithRMHA is needed?


> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4579) Allow DefaultContainerExecutor container log directory permissions to be configurable

2016-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168225#comment-15168225
 ] 

Hudson commented on YARN-4579:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9372 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9372/])
YARN-4579. Allow DefaultContainerExecutor container log directory (rkanter: rev 
d7fdec1e6b465395d2faca6e404e329d20f6c3d8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt


> Allow DefaultContainerExecutor container log directory permissions to be 
> configurable
> -
>
> Key: YARN-4579
> URL: https://issues.apache.org/jira/browse/YARN-4579
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Fix For: 2.9.0
>
> Attachments: YARN-4579.001.patch, YARN-4579.002.patch, 
> YARN-4579.003.patch, YARN-4579.004.patch, YARN-4579.005.patch, 
> YARN-4579.006.patch
>
>
> By default, container directory permissions are hardcoded to this member in 
> DefaultContainerExecutor:
>   static final short LOGDIR_PERM = (short)0710;
> There are some cases where less restrictive permissions are desired.  Make 
> this configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358

2016-02-25 Thread Ishai Menache (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishai Menache updated YARN-4359:

Attachment: (was: YARN-4359.2.patch)

> Update LowCost agents logic to take advantage of YARN-4358
> --
>
> Key: YARN-4359
> URL: https://issues.apache.org/jira/browse/YARN-4359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Ishai Menache
> Attachments: YARN-4359.0.patch, YARN-4359.3.patch
>
>
> Given the improvements of YARN-4358, the LowCost agent should be improved to 
> leverage this, and operate on RLESparseResourceAllocation (ideally leveraging 
> the improvements of YARN-3454 to compute avaialable resources)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358

2016-02-25 Thread Ishai Menache (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishai Menache updated YARN-4359:

Attachment: YARN-4359.3.patch

> Update LowCost agents logic to take advantage of YARN-4358
> --
>
> Key: YARN-4359
> URL: https://issues.apache.org/jira/browse/YARN-4359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Ishai Menache
> Attachments: YARN-4359.0.patch, YARN-4359.3.patch
>
>
> Given the improvements of YARN-4358, the LowCost agent should be improved to 
> leverage this, and operate on RLESparseResourceAllocation (ideally leveraging 
> the improvements of YARN-3454 to compute avaialable resources)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4711) NPE in NMTimelinePublisher

2016-02-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168183#comment-15168183
 ] 

Sangjin Lee commented on YARN-4711:
---

As discussed offline, probably we should re-characterize this JIRA to address 
the low throughput issue with single-threaded writes (of sync entities).

> NPE in NMTimelinePublisher
> --
>
> Key: YARN-4711
> URL: https://issues.apache.org/jira/browse/YARN-4711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
>
> While testing the latest 2928 branch came across a NPE which is shutting down 
> the NM
> {code}
> 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Seems to be a race condition ...
> Few other NPE's faced :
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> * Also there is possibility of NPE in TimelineEntity.toString() when real is 
> not null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168173#comment-15168173
 ] 

Sangjin Lee commented on YARN-4700:
---

I see. This is regarding the flow activity table, right? Writes to the flow 
activity table are done when an APPLICATION_CREATED_EVENT is handled. Does a RM 
restart trigger a duplicate APPLICATION_CREATED_EVENT for existing apps?

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-02-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168159#comment-15168159
 ] 

Sangjin Lee commented on YARN-4736:
---

This could be a bug in HBase. It seems like the HBase cluster was already shut 
down, but the flush operation took a long time to finally error out (36 
minutes):
{noformat}
2016-02-26 00:02:28,270 INFO 
org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager:
 The collector service for application_1456425026132_0001 was removed
2016-02-26 00:39:03,879 ERROR org.apache.hadoop.hbase.client.AsyncProcess: 
Failed to get region location 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=36, exceptions:
Fri Feb 26 00:39:03 IST 2016, null, java.net.SocketTimeoutException: 
callTimeout=6, callDuration=68065: row 
'timelineservice.entity,naga!yarn_cluster!flow_1456425026132_1!���!M�����!YARN_CONTAINER!container_1456425026132_0001_01_01,99'
 on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=localhost,16201,1456365764939, seqNum=0

at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:264)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:215)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at 
org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
at 
org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200)
at 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1109)
at 
org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369)
at 
org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320)
at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206)
at 
org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.BufferedMutatorDelegator.flush(BufferedMutatorDelegator.java:66)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.flush(HBaseTimelineWriterImpl.java:457)
at 
org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager$WriterFlushTask.run(TimelineCollectorManager.java:230)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: callTimeout=6, 
callDuration=68065: row 
'timelineservice.entity,naga!yarn_cluster!flow_1456425026132_1!���!M�����!YARN_CONTAINER!container_1456425026132_0001_01_01,99'
 on table 'hbase:meta' at region=hbase:meta,,1.1588230740, 
hostname=localhost,16201,1456365764939, seqNum=0
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:310)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:291)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more
Caused by: java.net.ConnectException: Connection refused
{noformat}

But this seems to have put the HBase client stuck in the flush state. That 
thread seems to be stuck:
{noformat}
"pool-14-thread-1" prio=10 tid=0x7f4215268000 nid=0x46e6 waiting on 
condition [0x7f41fe75d000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xeeb5a010> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 

[jira] [Updated] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started

2016-02-25 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-4686:
--
Attachment: YARN-4686.001.patch

Transition 1 RM to active in MiniYarnCluster start. Also improve 
MiniYarnCluster start by waiting for Scheduler to see all NMs. 

> MiniYARNCluster.start() returns before cluster is completely started
> 
>
> Key: YARN-4686
> URL: https://issues.apache.org/jira/browse/YARN-4686
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Rohith Sharma K S
>Assignee: Eric Badger
> Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch
>
>
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 
> sec  <<< FAILURE!
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but 
> was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168145#comment-15168145
 ] 

Hadoop QA commented on YARN-4718:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 17 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 23s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 7 new + 488 unchanged - 21 fixed = 495 total (was 509) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 18s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 32s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95
 with JDK v1.7.0_95 generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 19s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 32s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 150m 39s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices |
|   | 

[jira] [Created] (YARN-4738) Notify the RM about the status of OPPORTUNISTIC containers

2016-02-25 Thread Konstantinos Karanasos (JIRA)
Konstantinos Karanasos created YARN-4738:


 Summary: Notify the RM about the status of OPPORTUNISTIC containers
 Key: YARN-4738
 URL: https://issues.apache.org/jira/browse/YARN-4738
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Konstantinos Karanasos
Assignee: Konstantinos Karanasos


When an OPPORTUNISTIC container finishes its execution (either successfully or 
because it failed/got killed), the RM needs to be notified.
This way the AM also gets notified in turn about the successfully completed 
tasks, as well as for rescheduling failed/killed tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2883) Queuing of container requests in the NM

2016-02-25 Thread Konstantinos Karanasos (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-2883:
-
Attachment: YARN-2883-yarn-2877.001.patch

I am attaching an initial version of the patch for this JIRA.
Please let me know how it looks.

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-yarn-2877.001.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-25 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168104#comment-15168104
 ] 

Li Lu commented on YARN-4700:
-

[~sjlee0] I checked the flowrun activity table and I can see the row keys for 
the same flow is like this:
{code}
Current count: 1, row: 
yarn_cluster!\x7F\xFF\xFE\xAC\xEFl\xBB\xFF!llu!flow_1445894691726_1
...
Current count: 10, row: 
yarn_cluster!\x7F\xFF\xFE\xAD7\x85\xC3\xFF!llu!flow_1445894691726_1
...
Current count: 19, row: 
yarn_cluster!\x7F\xFF\xFE\xAD<\xAC\x1F\xFF!llu!flow_1445894691726_1
...
{code}

>From FlowActivityTable, I can see the row key contains inv_top_of_day, which 
>is the reason we have duplicated flows. However, for each of the flows, I only 
>ran it for once, and never touch them again. By design this should not 
>regenerate any new flow activities? Is this related to RM restart? 

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168096#comment-15168096
 ] 

Hadoop QA commented on YARN-4723:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
45s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
23s {color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s 
{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} branch-2.7 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 9s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in branch-2.7 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} branch-2.7 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} branch-2.7 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 1486 line(s) that end in whitespace. Use 
git apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 41s 
{color} | {color:red} The patch has 99 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 4s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 44m 56s 
{color} | {color:red} Patch generated 77 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 163m 19s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 

[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException

2016-02-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168092#comment-15168092
 ] 

Kuhu Shukla commented on YARN-4723:
---

{{TestSystemMetricsPublisher}} and {{TestZKRMStateStore}} run fine locally and 
I don't see correlation between the change and the test failure. Requesting 
[~jlowe] for more comments/review. Thank you very much!

> NodesListManager$UnknownNodeId ClassCastException
> -
>
> Key: YARN-4723
> URL: https://issues.apache.org/jira/browse/YARN-4723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Jason Lowe
>Assignee: Kuhu Shukla
>Priority: Critical
> Attachments: YARN-4723-branch-2.7.001.patch, YARN-4723.001.patch, 
> YARN-4723.002.patch
>
>
> Saw the following in an RM log:
> {noformat}
> 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC 
> Server handler 5 on 8030, call 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff
> java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:658)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:647)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
> at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:9335)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:144)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:175)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:96)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server.call(Server.java:2267)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928

2016-02-25 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4712:
--
Labels: yarn-2928-1st-milestone  (was: )

> CPU Usage Metric is not captured properly in YARN-2928
> --
>
> Key: YARN-4712
> URL: https://issues.apache.org/jira/browse/YARN-4712
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from 
> {{pTree.getCpuUsagePercent()}} is 
> ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do 
> the calculation  i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore 
> /resourceCalculatorPlugin.getNumProcessors()}} because of which UNAVAILABLE 
> check in {{NMTimelinePublisher.reportContainerResourceUsage}} is not 
> encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but 
> ContainerMonitor is publishing decimal values for the CPU usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-02-25 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4736:
--
Labels: yarn-2928-1st-milestone  (was: )

> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Vrushali C
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: hbaseException.log, threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-25 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4700:
--
Labels: yarn-2928-1st-milestone  (was: )

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4711) NPE in NMTimelinePublisher

2016-02-25 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4711:
--
Labels: yarn-2928-1st-milestone  (was: )

> NPE in NMTimelinePublisher
> --
>
> Key: YARN-4711
> URL: https://issues.apache.org/jira/browse/YARN-4711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
>
> While testing the latest 2928 branch came across a NPE which is shutting down 
> the NM
> {code}
> 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Seems to be a race condition ...
> Few other NPE's faced :
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> * Also there is possibility of NPE in TimelineEntity.toString() when real is 
> not null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4579) Allow DefaultContainerExecutor container log directory permissions to be configurable

2016-02-25 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168087#comment-15168087
 ] 

Robert Kanter commented on YARN-4579:
-

+1

> Allow DefaultContainerExecutor container log directory permissions to be 
> configurable
> -
>
> Key: YARN-4579
> URL: https://issues.apache.org/jira/browse/YARN-4579
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: supportability
> Attachments: YARN-4579.001.patch, YARN-4579.002.patch, 
> YARN-4579.003.patch, YARN-4579.004.patch, YARN-4579.005.patch, 
> YARN-4579.006.patch
>
>
> By default, container directory permissions are hardcoded to this member in 
> DefaultContainerExecutor:
>   static final short LOGDIR_PERM = (short)0710;
> There are some cases where less restrictive permissions are desired.  Make 
> this configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168080#comment-15168080
 ] 

Sangjin Lee commented on YARN-4700:
---

These are the current default values that are being set as the context info 
(which is then used as part of the row key):
{code:title=AppLevelTimelineCollector.java}
context.setClusterId(conf.get(YarnConfiguration.RM_CLUSTER_ID,
YarnConfiguration.DEFAULT_RM_CLUSTER_ID));
// Set the default values, which will be updated with an RPC call to get the
// context info from NM.
// Current user usually is not the app user, but keep this field non-null
context.setUserId(UserGroupInformation.getCurrentUser().getShortUserName());
// Use app ID to generate a default flow name for orphan app
context.setFlowName(
TimelineUtils.generateDefaultFlowNameBasedOnAppId(appId));
// Set the flow version to string 1 if it's an orphan app
context.setFlowVersion("1");
// Set the flow run ID to 1 if it's an orphan app
context.setFlowRunId(1L);
context.setAppId(appId.toString());
{code}

The flow name, version, and the run id may be overridden if the application 
sets the YARN tag.

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168069#comment-15168069
 ] 

Hadoop QA commented on YARN-4723:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 23s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 30s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 149m 55s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher |
|   | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-25 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168063#comment-15168063
 ] 

Sangjin Lee commented on YARN-4700:
---

[~gtCarrera9], can you provide the exact row keys in this problem scenario? 
From what I can see in the code, if you're not setting the cluster id in the 
configuration (under key "yarn.resourcemanager.cluster-id"), it should be the 
default value ("yarn_cluster"). I'm not sure where the duplicate keys are 
coming from. Once we see the duplicate row keys for the same app, it should 
become clearer.

Are you seeing this with a DS app or a MR app?

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168055#comment-15168055
 ] 

Hadoop QA commented on YARN-4359:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 16s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 13s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 16s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 16s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 15 new + 25 unchanged - 1 fixed = 40 total (was 26) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 17s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 2 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 47s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_72
 with JDK v1.8.0_72 generated 3 new + 100 unchanged - 0 fixed = 103 total (was 
100) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 1m 25s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95
 with JDK v1.7.0_95 generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 13s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 15s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed 

[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168029#comment-15168029
 ] 

Allen Wittenauer commented on YARN-4734:


Also:

bq. Since now we haven't integrated web UI building to Hadoop's build system, 
it is not required for now.

Why hasn't this been integrated?  It makes me extremely uneasy that you're 
proposing to merge a branch where that branch's JIRAs are still mostly open and 
has had zero integration work done, especially given that a vote was called on 
an obviously broken code base.  

Something else:

I don't think hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui the correct place 
given this appears to be a strictly server-side component.  It should be moved 
to be under hadoop-yarn-server minimally.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168008#comment-15168008
 ] 

Allen Wittenauer commented on YARN-4734:


bq. license and notice issues on dependencies

Specifically, hadoop/LICENSE and hadoop/NOTICE need to have the appropriate 
updates made in order to comply with ASF and your new dependencies' legal 
requirements. 

bq.  My understanding is Dockerfile will be used by Yetus to run build, 
correct? Since now we haven't integrated web UI building to Hadoop's build 
system, it is not required for now.

The Dockerfile's primary purpose in life is to support start-build-env.sh as 
part of HADOOP-11843 which predates Yetus by months.  Anyway, the Dockerfile 
should have *every* dependency needed for someone to build _and develop_ Apache 
Hadoop.  Given the content of the README.MD, this is most definitely required, 
otherwise HADOOP-11843 is no longer true.

Speaking of, why is this a separate README.MD here and not part of BUILDING.txt 
or something less hidden?

bq. Not entirely sure about it, I have found there're a couple of .gitignore 
files under different source folders.

Argh. 

bq. We can easier manage sub-project's ignore files by placing .gitignore file, 
could you let me know why you don't want it?

Because it becomes a management nightmare when one is looking at the project in 
it's entirety.  Is this file or dir ignored here?  Well, first we have to find 
the appropriate .gitignore





> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168004#comment-15168004
 ] 

Hadoop QA commented on YARN-4734:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue} 0m 4s 
{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 23s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 9m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green} 0m 
8s {color} | {color:green} There were no new shellcheck issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 50 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s 
{color} | {color:red} The patch has 242 line(s) with tabs. {color} |
| {color:red}-1{color} | {color:red} xml {color} | {color:red} 0m 0s {color} | 
{color:red} The patch has 1 ill-formed XML file(s). {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s 
{color} | {color:red} Patch generated 8 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 14s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790008/YARN-4734.2.patch |
| JIRA Issue | YARN-4734 |
| Optional Tests |  asflicense  mvnsite  shellcheck  shelldocs  xml  |
| uname | Linux 3dfc29ba8ba4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / c4d4df8 |
| shellcheck | v0.4.3 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/10640/artifact/patchprocess/whitespace-eol.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/10640/artifact/patchprocess/whitespace-tabs.txt
 |
| xml | 
https://builds.apache.org/job/PreCommit-YARN-Build/10640/artifact/patchprocess/xml.txt
 |
| asflicense | 
https://builds.apache.org/job/PreCommit-YARN-Build/10640/artifact/patchprocess/patch-asflicense-problems.txt
 |
| modules | C:  hadoop-yarn-project/hadoop-yarn   .  U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10640/console |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4359) Update LowCost agents logic to take advantage of YARN-4358

2016-02-25 Thread Ishai Menache (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishai Menache updated YARN-4359:

Attachment: YARN-4359.2.patch

> Update LowCost agents logic to take advantage of YARN-4358
> --
>
> Key: YARN-4359
> URL: https://issues.apache.org/jira/browse/YARN-4359
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Carlo Curino
>Assignee: Ishai Menache
> Attachments: YARN-4359.0.patch, YARN-4359.2.patch
>
>
> Given the improvements of YARN-4358, the LowCost agent should be improved to 
> leverage this, and operate on RLESparseResourceAllocation (ideally leveraging 
> the improvements of YARN-3454 to compute avaialable resources)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN

2016-02-25 Thread Jonathan Maron (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167973#comment-15167973
 ] 

Jonathan Maron commented on YARN-4737:
--

Thank you!

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN

2016-02-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167969#comment-15167969
 ] 

Wangda Tan commented on YARN-4737:
--

[~jmaron] added you to contributor and assigned ticket to you.

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4737) Use CSRF Filter in YARN

2016-02-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4737:
-
Assignee: Jonathan Maron

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN

2016-02-25 Thread Jonathan Maron (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167965#comment-15167965
 ] 

Jonathan Maron commented on YARN-4737:
--

Could this be reassigned to me ([~jmaron]?

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167943#comment-15167943
 ] 

Wangda Tan commented on YARN-4734:
--

Attached ver.2 patch.
- Fixed ASF license issues
- Removed unnecessary files (such as .gitkeep)

[~aw], for your comments, I've addressed some of them, but some questions:
bq. license and notice issues on dependencies
Not sure what is it, could you explain? 

bq. Dockerfile support for dependencies
My understanding is Dockerfile will be used by Yetus to run build, correct? 
Since now we haven't integrated web UI building to Hadoop's build system, it is 
not required for now.

bq. dot-files in a subdir whose content should be in root (e.g., .gitignore)
Not entirely sure about it, I have found there're a couple of .gitignore files 
under different source folders. We can easier manage sub-project's ignore files 
by placing .gitignore file, could you let me know why you don't want it?

bq. dot-files that shouldn't be there at all
Removed some of them, only left dot-files which are configuration files for 
package management.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4737) Use CSRF Filter in YARN

2016-02-25 Thread Jonathan Maron (JIRA)
Jonathan Maron created YARN-4737:


 Summary: Use CSRF Filter in YARN
 Key: YARN-4737
 URL: https://issues.apache.org/jira/browse/YARN-4737
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager, webapp
Reporter: Jonathan Maron


A CSRF filter was added to hadoop common 
(https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA is 
to come up with a mechanism to integrate this filter into the webapps for which 
it is applicable (web apps that may establish an authenticated identity).  That 
includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4734:
-
Attachment: YARN-4734.2.patch

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4731) Linux container executor fails to delete nmlocal folders

2016-02-25 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4731:
-
Affects Version/s: 2.9.0
 Target Version/s: 2.9.0
 Priority: Blocker  (was: Critical)
 Hadoop Flags: Reviewed

Thanks for the report, Bibin, and the patch, Varun!

This was triggered by YARN-4594.  My apologies for missing it during that 
review, as I had forgotten about the symlinks in the container directory.

+1 patch looks good to me.  Pinging [~cmccabe] in case he has time to take a 
look as well.

bq. Signal to container is throwing throwing exception in LCE

I don't believe that is related since YARN-4594 didn't modify the signal path 
IIRC.  I think we should address that as a separate JIRA.  Is the signal issue 
something happening in 2.8 or earlier?

> Linux container executor fails to delete nmlocal folders
> 
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: YARN-4731.001.patch
>
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 10 more
> {noformat}
> As a result nodemanager-local directory are not getting deleted for each 
> application
> {noformat}
> total 36
> drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./
> drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../
> -rw--- 1 hdfs hadoop  340 Feb 25 08:25 container_tokens

[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167889#comment-15167889
 ] 

Bikas Saha commented on YARN-1040:
--

Vinod, the plan you are suggesting has merits. But my initial impression is 
that reworking allocations and containers is a much bigger change than whats 
proposed earlier in this jira. Not only internally in YARN but also externally 
in terms of thinking about the whole larger flow of allocations and containers 
for users of YARN.
The proposal discussed earlier is of much smaller scope and I believe 
sufficient to take us where we need to go. And it does not need reworking the 
RM related flow of allocations and containers. E.g. it may not be necessary for 
the RM to understand single use allocations vs multi-use vs concurrent use 
allocations. But for the RM level changes you are suggesting we may be on the 
path of convergence.

At this point, the discussion is complex enough that we may want to gather 
interested people and do it as a group outside jira comments and then post it 
back.

> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011

2016-02-25 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-4718:
--
Attachment: YARN-4718-v003.patch

Fixed javadocs and checkstyles

> Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
> 
>
> Key: YARN-4718
> URL: https://issues.apache.org/jira/browse/YARN-4718
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-4718-v000.patch, YARN-4718-v001.patch, 
> YARN-4718-v002.patch, YARN-4718-v003.patch
>
>
> As discussed in YARN-1011, we are planning to add a few more variables 
> regarding utilization to SchedulerNode. To ensure the new variables are 
> descriptive and don't lead to confusion, the existing variables could use 
> some renaming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167858#comment-15167858
 ] 

Hudson commented on YARN-4701:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9370 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9370/])
YARN-4701. When task logs are not available, port 8041 is referenced (rkanter: 
rev c4d4df8de09ee0c89ea8176bd8149900becd3c0c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/webapp/dao/AMAttemptInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/webapp/TestHsWebServicesJobs.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogsBlock.java


> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch, yarn4701.002.patch, 
> yarn4701.003.patch, yarn4701.004.patch
>
>
> Accessing logs for an task attempt in the workflow tool in Hue shows "Logs 
> not available for attempt_1433822010707_0001_m_00_0. Aggregation may not 
> be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041" 
> If the user follows that link, he/she will get "It looks like you are making 
> an HTTP request to a Hadoop IPC port. This is not the correct port for the 
> web interface on this daemon." 
> We should update the message to use the correct HTTP port. We could also make 
> it more convenient by providing the application's specific page at NM as well 
> instead of just NM's main page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167837#comment-15167837
 ] 

Hadoop QA commented on YARN-4701:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 28s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 48s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
55s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
28s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 14s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 41s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 47s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 53s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 31s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 32s 
{color} | {color:red} Patch generated 14 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 103m 13s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| 

[jira] [Updated] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException

2016-02-25 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4723:
--
Attachment: YARN-4723-branch-2.7.001.patch

Attaching patch for branch-2.7.

> NodesListManager$UnknownNodeId ClassCastException
> -
>
> Key: YARN-4723
> URL: https://issues.apache.org/jira/browse/YARN-4723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Jason Lowe
>Assignee: Kuhu Shukla
>Priority: Critical
> Attachments: YARN-4723-branch-2.7.001.patch, YARN-4723.001.patch, 
> YARN-4723.002.patch
>
>
> Saw the following in an RM log:
> {noformat}
> 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC 
> Server handler 5 on 8030, call 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff
> java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:658)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:647)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
> at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:9335)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:144)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:175)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:96)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server.call(Server.java:2267)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-25 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167794#comment-15167794
 ] 

Vinod Kumar Vavilapalli commented on YARN-1040:
---

Still catching up on some of your discussion, but quick comments on a few 
things that I care

We should really call them instead as {{Allocation}} and {{Container}}. That's 
the nomenclature I used at YARN-4726. Till now, YARN has combined the notion of 
Allocation and Container, which is the main de-linking that we need to do here. 
Process has an OS level connotation, and doesn't work well with more things in 
the picture like process-trees / multiple-processes / Docker (YARN-2466).

Taking this further, here's how the overall picture can look like the following

*ResourceManager*
 - RM only does allocations in the scheduling path. ResourceManager does all 
scheduling based on AllocationRequests and tracks Allocations..
 - RM receives AllocationRequests and returns fulfilled Allocations (and 
AllocationTokens) to AMs.

*Applications*
 - AM can in turn use the Allocations (and AllocationTokens) to launch multiple 
Containers on the NM.
-- Simple case: AM only launches containers one-after-another. It's up to 
the app to do this.
-- General case: AM launches multiple containers at the same time. This is 
essentially container-groups - we should keep this option open.
 - AMs can specify *single-use* AllocationRequests, at which point RM can 
simply return Containers and Container-Tokens (today's code-path).
 - Each Container exits when the process-tree / linux-container exits.
 - Each Container has an Identifier.
-- For single-use allocation-requests, RM generates ContainerIDs
-- For multi-use allocation-requests, apps could optionally specify a 
container-name that is scoped under the allocation. NM always returns a 
(generated or app-specified) ContainerID based off the allocation-ID. 
Essentially, allocationID + containerID is unique

*NodeManagers*
 - NodeManager also understands incoming Allocations and ties them to Container 
groups: it deals with Allocation activation/deactivation and Container 
start/stop. but does
-- the following *decoupled from both allocations and containers*: 
localizations / re-localizations. This means local-resources should now have 
more scopes: container, allocation, application etc.
-- *per allocation*: enforcement of resource-limits
-- all of the following *per container*: (a) process/OS-container 
activation / deactivation, (b) process/OS-container auto-restart (YARN-4725)  
log-aggregation

> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException

2016-02-25 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4723:
--
Attachment: YARN-4723.002.patch

Updated patch with checkstyle and test comment rectified.

> NodesListManager$UnknownNodeId ClassCastException
> -
>
> Key: YARN-4723
> URL: https://issues.apache.org/jira/browse/YARN-4723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Jason Lowe
>Assignee: Kuhu Shukla
>Priority: Critical
> Attachments: YARN-4723.001.patch, YARN-4723.002.patch
>
>
> Saw the following in an RM log:
> {noformat}
> 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC 
> Server handler 5 on 8030, call 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff
> java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:658)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:647)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
> at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:9335)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:144)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:175)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:96)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server.call(Server.java:2267)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4723) NodesListManager$UnknownNodeId ClassCastException

2016-02-25 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167784#comment-15167784
 ] 

Jason Lowe commented on YARN-4723:
--

Thanks, Kuhu!  Patch looks pretty good, just a couple of nits:
- In the test NODE_USABLE should be NODE_UNUSABLE for the assert message
- Please look into the checkstyle issues.


> NodesListManager$UnknownNodeId ClassCastException
> -
>
> Key: YARN-4723
> URL: https://issues.apache.org/jira/browse/YARN-4723
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Jason Lowe
>Assignee: Kuhu Shukla
>Priority: Critical
> Attachments: YARN-4723.001.patch
>
>
> Saw the following in an RM log:
> {noformat}
> 2016-02-16 22:55:35,207 [IPC Server handler 5 on 8030] WARN ipc.Server: IPC 
> Server handler 5 on 8030, call 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server@6c403aff
> java.lang.ClassCastException: 
> org.apache.hadoop.yarn.server.resourcemanager.NodesListManager$UnknownNodeId 
> cannot be cast to org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToBuilder(NodeReportPBImpl.java:247)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.mergeLocalToProto(NodeReportPBImpl.java:271)
> at 
> org.apache.hadoop.yarn.api.records.impl.pb.NodeReportPBImpl.getProto(NodeReportPBImpl.java:220)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.convertToProtoFormat(AllocateResponsePBImpl.java:712)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.access$500(AllocateResponsePBImpl.java:68)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:658)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl$6$1.next(AllocateResponsePBImpl.java:647)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.checkForNullValues(AbstractMessageLite.java:336)
> at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:323)
> at 
> org.apache.hadoop.yarn.proto.YarnServiceProtos$AllocateResponseProto$Builder.addAllUpdatedNodes(YarnServiceProtos.java:9335)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToBuilder(AllocateResponsePBImpl.java:144)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.mergeLocalToProto(AllocateResponsePBImpl.java:175)
> at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.AllocateResponsePBImpl.getProto(AllocateResponsePBImpl.java:96)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:61)
> at 
> org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:608)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server.call(Server.java:2267)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:648)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:615)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2217)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4711) NPE in NMTimelinePublisher

2016-02-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167776#comment-15167776
 ] 

Naganarasimha G R commented on YARN-4711:
-

As per initial analysis i am able to reproduce these issues when the containers 
run for little longer duration (due to which NM can report more container 
resource usages). So possible cause would be that YARN-3367 is making all the 
timeline entity Publishes in a single threaded and in NM all are published as 
Sync events. Tried changing  container resource usages as asyncs events and was 
able to avoid these exceptions. Will test more and confirm.

> NPE in NMTimelinePublisher
> --
>
> Key: YARN-4711
> URL: https://issues.apache.org/jira/browse/YARN-4711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
>
> While testing the latest 2928 branch came across a NPE which is shutting down 
> the NM
> {code}
> 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Seems to be a race condition ...
> Few other NPE's faced :
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> * Also there is possibility of NPE in TimelineEntity.toString() when real is 
> not null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-25 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167771#comment-15167771
 ] 

Allen Wittenauer commented on YARN-4734:


This is so not ready to be merged. A very strong -1 from me.

* ASF license issues on files
* license and notice issues on dependencies
* Dockerfile support for dependencies
* dot-files in a subdir whose content should be in root (e.g., .gitignore)
* dot-files that shouldn't be there at all (e.g., travis-ci support)
* files that will result in packaging cruft (e.g., all those .gitkeep files)


> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4711) NPE in NMTimelinePublisher

2016-02-25 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4711:

Description: 
While testing the latest 2928 branch came across a NPE which is shutting down 
the NM
{code}
2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
{code}
Seems to be a race condition ...

Few other NPE's faced :
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
{code}

* Also there is possibility of NPE in TimelineEntity.toString() when real is 
not null


  was:
While testing the latest 2928 branch came across a NPE which is shutting down 
the NM
{code}
2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
at 
org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
{code}
Seems to be a race condition ...


> NPE in NMTimelinePublisher
> --
>
> Key: YARN-4711
> URL: https://issues.apache.org/jira/browse/YARN-4711
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Critical
>
> While testing the latest 2928 branch came across a NPE which is shutting down 
> the NM
> {code}
> 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
> Error in dispatcher thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Seems to be a race condition ...
> Few other NPE's faced :
> {code}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289)
> at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280)
> at 
> 

[jira] [Assigned] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-02-25 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C reassigned YARN-4736:


Assignee: Vrushali C

> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Vrushali C
>Priority: Critical
> Attachments: hbaseException.log, threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-25 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167741#comment-15167741
 ] 

Naganarasimha G R commented on YARN-4700:
-

Any suggestions for the above approach [~gtCarrera9]/[~sjlee0] ?

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-02-25 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4736:

Attachment: hbaseException.log
threaddump.log

For Issue 1 : threaddump.log
For Issue 2 : hbaseException.log
cc/ [~vrushalic]. 
Not sure its a setup issue, please check once ...

> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Priority: Critical
> Attachments: hbaseException.log, threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-25 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167738#comment-15167738
 ] 

Bikas Saha commented on YARN-1040:
--

My guess is that YARN-4725 may be redundant after we do this work because then 
we would have exposed primitives to apps to make that happen. The arguments for 
YARN not doing it by itself would be the same. If it can be done easily by the 
app and is very likely app dependent without one-size-fits-all then let the app 
do it.

Coming back to this jira. Yes, lets please track any first-class support of the 
notion of upgrades separately which can be done as a follow up.

Perhaps we can put the design in a document and look at the next level of 
details. We can send email to the dev list after adding a more detailed 
document to this jira. Then, based on +ve feedback, we could go ahead with 
jiras/code. The devil is in the details :) This would be a significant change 
and we could use more eyes for reviews.

For startProcess identifier, it may be useful for the app to provide the 
identifier in startProcess and then use it later to refer to the process. E.g. 
stopProcess. vs YARN trying to come up with identifiers. This may make the apps 
life easier because it could use meaningful terms based on its own logic. We 
can discuss such details in the design document.


> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-02-25 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-4736:
---

 Summary: Issues with HBaseTimelineWriterImpl
 Key: YARN-4736
 URL: https://issues.apache.org/jira/browse/YARN-4736
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Naganarasimha G R
Priority: Critical


Faced some issues while running ATSv2 in single node Hadoop cluster and in the 
same node had launched Hbase with embedded zookeeper.
# Due to some NPE issues i was able to see NM was trying to shutdown, but the 
NM daemon process was not completed due to the locks.
# Got some exception related to Hbase after application finished execution 
successfully. 
will attach logs and the trace for the same




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167722#comment-15167722
 ] 

Wangda Tan commented on YARN-4734:
--

Thanks [~djp],

[~aw] mentioned the same thing, will update the patch soon.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-25 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167718#comment-15167718
 ] 

Ming Ma commented on YARN-4720:
---

Even though all tests passed by jenkins, but if you run the new 
{{testSkipUnnecessaryNNOperations}} individually, it failed. Based on how the 
test is written, {{this.context.getLogAggregationStatusForApps().size()}} 
should be 4. The reason it passes if all tests run together could be due to the 
fact that second {{doLogAggregationOutOfBand()}} and the 
{{LogHandlerAppFinishedEvent}} right after that only cause one notification. 
After fixing the size check to 4, additional sleep after the second 
{{doLogAggregationOutOfBand()}} fixes the issue, or you can wait and verify 
{{getLogAggregationStatusForApps.size()}} after each 
{{doLogAggregationOutOfBand()}} call.

In addition, it might be useful to add another test case without long running 
service but with FailedOrKilledContainerLogAggregationPolicy; For that case, no 
log aggregation should happen. That will help to verify the other scenario. 
What do you think, [~hex108]?

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch, 
> YARN-4720.03.patch, YARN-4720.04.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-02-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167716#comment-15167716
 ] 

Varun Saxena commented on YARN-3863:


Sorry, added the comment by mistake.

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-02-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167712#comment-15167712
 ] 

Varun Saxena commented on YARN-3863:


To aid in review, I will jot down what has been done in this JIRA.
# The intention is to convert filters which were represented as maps or sets to 
TimelineFilterList which would help in complex filters being supported.
i.e. Let me take example of config filters in the format {{cfg1=value1, 
cfg2=value2, cfg3=value3, cfg4=value4}} which means all the key value pairs 
should be matched for an entity. With work in this JIRA,we can support complex 
filters such as {{(cfg1=value1 OR cfg2=value2) AND (cfg3 !=value3 AND 
cfg4!=value4)}}.

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167700#comment-15167700
 ] 

Hadoop QA commented on YARN-4718:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 17 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 15 new + 492 unchanged - 16 fixed = 507 total (was 508) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 19s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 28s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_95
 with JDK v1.7.0_95 generated 2 new + 2 unchanged - 0 fixed = 4 total (was 2) 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 12s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 51s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 150m 50s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | 

[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167685#comment-15167685
 ] 

Junping Du commented on YARN-4734:
--

Hi [~leftnoteasy], it sounds like the code doesn't include Apache license 
header. I think we should fix this.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2016-02-25 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167674#comment-15167674
 ] 

Arun Suresh commented on YARN-1040:
---

Thanks for clarifying [~bikassaha]

I propose we break this down into sub-jiras :
# New APIs specific to delinking container life-cycle from the process : This 
will include 4 of new APIs specified above (excluding localization) but will 
assume single process (and this the startProcess does not need to return a 
processId)
# Add support for clubbing APIs into a single RPC
** Might have to think a bit about validating the order and multiplicity of the 
API calls in each command (which I expect might be different for single process 
/ multiple processes) 
# Add support for localize API
# Add support for Multiple processes
** A processId will be returned for a startProcess. Might have to think thru 
this further. for eg. how does this integrate with YARN-4725

For the purpose of Application Upgrades (for which this JIRA is marked as a 
sub-task of... also why im calling it out specifically) : Add support for 
Container Upgrades
# Expose a canned NMCommand that has the list of APIs to upgrade based on some 
policy

If folks are fine with this, I will ahead and open JIRAs and link this issue to 
each of the above JIRAs (Since I don't think I can create subtasks for this 
JIRA) so that we can start work on the same..

> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-25 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167598#comment-15167598
 ] 

Robert Kanter commented on YARN-4701:
-

Kicked off Jenkins because it doesn't seem to have noticed this JIRA.

> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch, yarn4701.002.patch, 
> yarn4701.003.patch, yarn4701.004.patch
>
>
> Accessing logs for an task attempt in the workflow tool in Hue shows "Logs 
> not available for attempt_1433822010707_0001_m_00_0. Aggregation may not 
> be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041" 
> If the user follows that link, he/she will get "It looks like you are making 
> an HTTP request to a Hadoop IPC port. This is not the correct port for the 
> web interface on this daemon." 
> We should update the message to use the correct HTTP port. We could also make 
> it more convenient by providing the application's specific page at NM as well 
> instead of just NM's main page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-02-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167556#comment-15167556
 ] 

Varun Saxena commented on YARN-3863:


bq. l.532: This is an interesting point. Should we categorically disallow any 
multi-entity reads without a filter? Is it an obvious requirement? I understand 
we already set some default values (e.g. created time, etc.) so this might be a 
moot point, but do we need to check for it when some defaults are set anyway?
It is more like that we expect a filter list for filters from REST layer, even 
if empty. Otherwise I will have to introduce null checks everywhere. We can do 
that as well. I agree this might indicate that filters are necessary for multi 
entity reads. That was not the intention though.

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4731) Linux container executor fails to delete nmlocal folders

2016-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167520#comment-15167520
 ] 

Hadoop QA commented on YARN-4731:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red} 8m 58s {color} | 
{color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_72
 with JDK v1.8.0_72 generated 1 new + 2 unchanged - 1 fixed = 3 total (was 3) 
{color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 39s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 13s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 28m 23s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12789913/YARN-4731.001.patch |
| JIRA Issue | YARN-4731 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux a92ee9de4749 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 6979cbf |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_72 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| cc | 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_72:
 
https://builds.apache.org/job/PreCommit-YARN-Build/10635/artifact/patchprocess/diff-compile-cc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_72.txt
 |
| JDK v1.7.0_95  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/10635/testReport/ |
| modules | C: 

[jira] [Commented] (YARN-4735) Remove stale LogAggregationReport from NM's context

2016-02-25 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167456#comment-15167456
 ] 

Ming Ma commented on YARN-4735:
---

I just read the code again. {{NodeStatusUpdaterImpl}} 's 
{{getLogAggregationReportsForApps}} actually calls 
{{lastestLogAggregationStatus.poll()}} and add it to another newly created list 
to be sent to RM. So this isn't an issue?

> Remove stale LogAggregationReport from NM's context
> ---
>
> Key: YARN-4735
> URL: https://issues.apache.org/jira/browse/YARN-4735
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> {quote}
> All LogAggregationReport(current and previous) are only added to 
> *context.getLogAggregationStatusForApps*, and never removed.
> So for long running service, the LogAggregationReport list NM sends to RM 
> will grow over time.
> {quote}
> Per discussion in YARN-4720, we need remove stale LogAggregationReport from 
> NM's context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4720) Skip unnecessary NN operations in log aggregation

2016-02-25 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167435#comment-15167435
 ] 

Ming Ma commented on YARN-4720:
---

+1 on the latest patch. I will wait until tomorrow to commit it in case others 
have more input.

> Skip unnecessary NN operations in log aggregation
> -
>
> Key: YARN-4720
> URL: https://issues.apache.org/jira/browse/YARN-4720
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Jun Gong
> Attachments: YARN-4720.01.patch, YARN-4720.02.patch, 
> YARN-4720.03.patch, YARN-4720.04.patch
>
>
> Log aggregation service could have unnecessary NN operations in the following 
> scenarios:
> * No new local log has been created since the last upload for the long 
> running service scenario.
> * NM uses {{ContainerLogAggregationPolicy}} that skips log aggregation for 
> certain containers.
> In the following code snippet, even though {{pendingContainerInThisCycle}} is 
> empty, it still creates the writer and then removes the file later. Thus it 
> introduces unnecessary create/getfileinfo/delete NN calls when NM doesn't 
> aggregate logs for an app.
>   
> {noformat}
> AppLogAggregatorImpl.java
> ..
> writer =
> new LogWriter(this.conf, this.remoteNodeTmpLogFileForApp,
> this.userUgi);
> ..
>   for (ContainerId container : pendingContainerInThisCycle) {
> ..
>   }
> ..
> if (remoteFS.exists(remoteNodeTmpLogFileForApp)) {
>   if (rename) {
> remoteFS.rename(remoteNodeTmpLogFileForApp, renamedPath);
>   } else {
> remoteFS.delete(remoteNodeTmpLogFileForApp, false);
>   }
> }
> ..
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart

2016-02-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167132#comment-15167132
 ] 

Junping Du commented on YARN-1489:
--

Another big problem is we don't actually notify a restarted AM the 
"finished-in-the-interim" containers. In YARN-1041, RM only report the running 
containers to AM new attempt, but if container get finished during this time, 
new AM attempt has no ways to know it - that make different behaviors for AMs - 
MR will rely on job history log to recover tasks, while Distributed Shell will 
launch these (finished) containers again. I think we should enhance this.
Will file a separated JIRA if we don't have it yet.


> [Umbrella] Work-preserving ApplicationMaster restart
> 
>
> Key: YARN-1489
> URL: https://issues.apache.org/jira/browse/YARN-1489
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Work preserving AM restart.pdf
>
>
> Today if AMs go down,
>  - RM kills all the containers of that ApplicationAttempt
>  - New ApplicationAttempt doesn't know where the previous containers are 
> running
>  - Old running containers don't know where the new AM is running.
> We need to fix this to enable work-preserving AM restart. The later two 
> potentially can be done at the app level, but it is good to have a common 
> solution for all apps where-ever possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4624) NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI

2016-02-25 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167113#comment-15167113
 ] 

Brahma Reddy Battula commented on YARN-4624:


As doing null check at  {{capacities.getMaxAMLimitPercentage() == null}} 
changed to wrapper which was not used anywhere. and couldn't be checked null in 
{{PartitionQueueCapacitiesInfo#getMaxAMLimitPercentage}} (REST API call will 
fail). 

> NPE in PartitionQueueCapacitiesInfo while accessing Schduler UI
> ---
>
> Key: YARN-4624
> URL: https://issues.apache.org/jira/browse/YARN-4624
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Attachments: SchedulerUIWithOutLabelMapping.png, YARN-2674-002.patch, 
> YARN-4624-003.patch, YARN-4624.patch
>
>
> Scenario:
> ===
> Configure nodelables and add to cluster
> Start the cluster
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.PartitionQueueCapacitiesInfo.getMaxAMLimitPercentage(PartitionQueueCapacitiesInfo.java:114)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:105)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:94)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:293)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
>   at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
>   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$LI._(Hamlet.java:7702)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:447)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
>   at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
>   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4731) Linux container executor fails to delete nmlocal folders

2016-02-25 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167106#comment-15167106
 ] 

Bibin A Chundatt commented on YARN-4731:


[~vvasudev]
 I have check the patch attached

*Issue 1*

# Container localization files are getting deleted properly

*Issues that still exists*

# Signal to container is throwing throwing exception in LCE

{noformat}
2016-02-25 13:08:20,442 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
 Using container runtime: DefaultLinuxContainerRuntime
2016-02-25 13:08:20,447 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 9. Privileged Execution Operation Output:
main : command provided 2
main : run as user is yarn
main : requested yarn user is yarn
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
yarn, yarn, 2, 23524, 9]
2016-02-25 13:08:20,447 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
 Signal container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=9:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
at 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor$DelayedProcessKiller.run(ContainerExecutor.java:532)
Caused by: ExitCodeException exitCode=9:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 4 more

{noformat}

# Container initalization error was thrown 

{noformat}
2016-02-25 13:08:20,183 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
 Using container runtime: DefaultLinuxContainerRuntime
2016-02-25 13:08:20,191 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 143. Privileged Execution Operation Output:
main : command provided 1
main : run as user is yarn
main : requested yarn user is yarn
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/opt/bibin/dsperf/HAINSTALL/nmlocal/nmPrivate/application_1456385661741_0001/container_1456385661741_0001_01_03/container_1456385661741_0001_01_03.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
Full command array for failed execution:
[nice, -n, 0, 
/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
yarn, yarn, 1, application_1456385661741_0001, 
container_1456385661741_0001_01_03, 
/opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/yarn/appcache/application_1456385661741_0001/container_1456385661741_0001_01_03,
 
/opt/bibin/dsperf/HAINSTALL/nmlocal/nmPrivate/application_1456385661741_0001/container_1456385661741_0001_01_03/launch_container.sh,
 
/opt/bibin/dsperf/HAINSTALL/nmlocal/nmPrivate/application_1456385661741_0001/container_1456385661741_0001_01_03/container_1456385661741_0001_01_03.tokens,
 
/opt/bibin/dsperf/HAINSTALL/nmlocal/nmPrivate/application_1456385661741_0001/container_1456385661741_0001_01_03/container_1456385661741_0001_01_03.pid,
 /opt/bibin/dsperf/HAINSTALL/nmlocal, /opt/bibin/dsperf/HAINSTALL/nmlog, 
cgroups=/cgroups/cpu/hadoop-yarn/container_1456385661741_0001_01_03/tasks]
2016-02-25 13:08:20,191 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
 Launch container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=143:
at 

[jira] [Commented] (YARN-4731) Linux container executor fails to delete nmlocal folders

2016-02-25 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167078#comment-15167078
 ] 

Bibin A Chundatt commented on YARN-4731:


[~vvasudev]

Thank you for attaching patch for the same. Will try patch attached

> Linux container executor fails to delete nmlocal folders
> 
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-4731.001.patch
>
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 10 more
> {noformat}
> As a result nodemanager-local directory are not getting deleted for each 
> application
> {noformat}
> total 36
> drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./
> drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../
> -rw--- 1 hdfs hadoop  340 Feb 25 08:25 container_tokens
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.jar -> 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/11/job.jar/
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.xml -> 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/13/job.xml*
> drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 jobSubmitDir/
> -rwx-- 1 hdfs hadoop 5348 Feb 25 08:25 launch_container.sh*
> drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 tmp/
> {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Updated] (YARN-4731) Linux container executor fails to delete nmlocal folders

2016-02-25 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4731:

Attachment: YARN-4731.001.patch

Root cause here is that we are using fstat on an open fd. The open call follows 
the symlink and we stat the directory pointed to by the symlink instead of the 
actual symlink. As a result rmdir fails because it doesn't delete symlinks.

The other problem with following symlinks in our case is that we end up 
deleting public resources because we use symlinks in the container work dir to 
point to the actual resources.

I've attached a patch to not follow symlinks and just call unlink on the 
symlink itself.

[~jlowe] - can you please take a look?

> Linux container executor fails to delete nmlocal folders
> 
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-4731.001.patch
>
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 10 more
> {noformat}
> As a result nodemanager-local directory are not getting deleted for each 
> application
> {noformat}
> total 36
> drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./
> drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../
> -rw--- 1 hdfs hadoop  340 Feb 25 08:25 container_tokens
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.jar -> 
> 

[jira] [Assigned] (YARN-4731) Linux container executor fails to delete nmlocal folders

2016-02-25 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev reassigned YARN-4731:
---

Assignee: Varun Vasudev

> Linux container executor fails to delete nmlocal folders
> 
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Varun Vasudev
>Priority: Critical
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 10 more
> {noformat}
> As a result nodemanager-local directory are not getting deleted for each 
> application
> {noformat}
> total 36
> drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./
> drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../
> -rw--- 1 hdfs hadoop  340 Feb 25 08:25 container_tokens
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.jar -> 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/11/job.jar/
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.xml -> 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/13/job.xml*
> drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 jobSubmitDir/
> -rwx-- 1 hdfs hadoop 5348 Feb 25 08:25 launch_container.sh*
> drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 tmp/
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4731) Linux container executor fails to delete nmlocal folders

2016-02-25 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166956#comment-15166956
 ] 

Bibin A Chundatt commented on YARN-4731:


*On Signal operation for container* also the below exception is thrown
{noformat}
2016-02-25 10:49:57,704 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
 Using container runtime: DefaultLinuxContainerRuntime
2016-02-25 10:49:57,709 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 9. Privileged Execution Operation Output:
main : command provided 2
main : run as user is hdfs
main : requested yarn user is hdfs
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
hdfs, hdfs, 2, 4850, 15]
2016-02-25 10:49:57,710 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
 Signal container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=9:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=9:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 9 more
{noformat}

> Linux container executor fails to delete nmlocal folders
> 
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
>

[jira] [Commented] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg

2016-02-25 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166923#comment-15166923
 ] 

sandflee commented on YARN-4673:


Hi, [~ozawa], in ResourceTrackService we may concurrently process 
nodeHeartBeat() with same nodeId and responseId, they may both pass the 
lastResonseId check,  this will cause the lost of RM message. With the 
Nodelock, we could process one by one, and the above exception could be catched.
{code}
  if (remoteNodeStatus.getResponseId() + 1 == lastNodeHeartbeatResponse
  .getResponseId()) {
LOG.info("Received duplicate heartbeat from node " +
rmNode.getNodeAddress() + " responseId=" +
remoteNodeStatus.getResponseId());
return lastNodeHeartbeatResponse;
  }
{code}

actually I have not encounter the bug caused by this, but this may be a risk.

> race condition in ResourceTrackerService#nodeHeartBeat while processing 
> deduplicated msg
> 
>
> Key: YARN-4673
> URL: https://issues.apache.org/jira/browse/YARN-4673
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4673.01.patch
>
>
> we could add a lock like ApplicationMasterService#allocate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4718) Rename variables in SchedulerNode to reduce ambiguity post YARN-1011

2016-02-25 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-4718:
--
Attachment: YARN-4718-v002.patch

Taking latest changes in trunk.

> Rename variables in SchedulerNode to reduce ambiguity post YARN-1011
> 
>
> Key: YARN-4718
> URL: https://issues.apache.org/jira/browse/YARN-4718
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Inigo Goiri
> Attachments: YARN-4718-v000.patch, YARN-4718-v001.patch, 
> YARN-4718-v002.patch
>
>
> As discussed in YARN-1011, we are planning to add a few more variables 
> regarding utilization to SchedulerNode. To ensure the new variables are 
> descriptive and don't lead to confusion, the existing variables could use 
> some renaming. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)