[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173396#comment-15173396
 ] 

Hadoop QA commented on YARN-4062:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 3m 50s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
3s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 56s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
29s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
57s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 52s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} hadoop-yarn-project/hadoop-yarn: patch generated 0 
new + 210 unchanged - 1 fixed = 210 total (was 211) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 29s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 41s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
20s {color} | {color:green} Patch does not generate ASF License warn

[jira] [Commented] (YARN-4741) RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event queue

2016-02-29 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173395#comment-15173395
 ] 

sandflee commented on YARN-4741:


without the fix of YARN-3990 and YARN-3896, our rm was flooded by node up/down 
events, and node is synced.  and have the same output in NM.
{quote}
2016-02-18 01:39:43,217 WARN 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node is out of 
sync with ResourceManager, hence resyncing.
2016-02-18 01:39:43,217 WARN 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from 
ResourceManager: Too far behind rm response id:100314 nm response id:0
{quote}

things may like that:
1,  nm restarted,  ResourceTrackerService send a NodeReconnectEvent to reset 
response id to 0,
2,  nodeHeartBeat is processed before NodeReconnectEvent is handled(dispatcher 
is flooded by RMAppNodeUpateEvent),  RM send sync command to NM for mismatch of 
response id,
3,  rmNode comes to REBOOT status, and remove it from rmContext.activeNodes
4,  nm register, create a new rmNode, added to  rmContext.activeNodes and send 
NodeStartEvent
5,  scheduler  complete the container running on node,   to AM container, will 
send FINISHED_CONTAINERS_PULLED_BY_AM event to RMNode , but the RMNode is in 
NEW state, couldn't handle FINISHED_CONTAINERS_PULLED_BY_AM.

> RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async 
> dispatcher event queue
> ---
>
> Key: YARN-4741
> URL: https://issues.apache.org/jira/browse/YARN-4741
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sangjin Lee
>Priority: Critical
> Attachments: nm.log
>
>
> We had a pretty major incident with the RM where it was continually flooded 
> with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event 
> queue.
> In our setup, we had the RM HA or stateful restart *disabled*, but NM 
> work-preserving restart *enabled*. Due to other issues, we did a cluster-wide 
> NM restart.
> Some time during the restart (which took multiple hours), we started seeing 
> the async dispatcher event queue building. Normally it would log 1,000. In 
> this case, it climbed all the way up to tens of millions of events.
> When we looked at the RM log, it was full of the following messages:
> {noformat}
> 2016-02-18 01:47:29,530 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid 
> event FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
> 2016-02-18 01:47:29,535 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Can't handle 
> this event at current state
> 2016-02-18 01:47:29,535 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid 
> event FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
> 2016-02-18 01:47:29,538 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Can't handle 
> this event at current state
> 2016-02-18 01:47:29,538 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid 
> event FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
> {noformat}
> And that node in question was restarted a few minutes earlier.
> When we inspected the RM heap, it was full of 
> RMNodeFinishedContainersPulledByAMEvents.
> Suspecting the NM work-preserving restart, we disabled it and did another 
> cluster-wide rolling restart. Initially that seemed to have helped reduce the 
> queue size, but the queue built back up to several millions and continued for 
> an extended period. We had to restart the RM to resolve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4750) App metrics may not be correct when an app is recovered

2016-02-29 Thread Lavkesh Lahngir (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavkesh Lahngir updated YARN-4750:
--
Description: 
App metrics(rather app attempt metrics) like Vcore-seconds and MB-seconds are 
saved in the state store when there is an attempt state transition. Values for 
running attempts will be in memory and will not be saved when there is an RM 
restart/failover. For recovered app metrics value will be reset. In that case, 
these values will be incomplete. 

Was this intentional or have we not found a correct way to fix it?


  was:
App metrics(rather app attempt metrics) like Vcore-seconds and MB-seconds are 
saved in the state store when there is an attempt state transition. Values for 
running attempts will be in memory and will not be saved when there is an RM 
restart/failover. For recovered app metrics value will be reset. In that case, 
the value will be incomplete. 

Was this intentional or have we not found a correct way to fix it ?



> App metrics may not be correct when an app is recovered
> ---
>
> Key: YARN-4750
> URL: https://issues.apache.org/jira/browse/YARN-4750
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> App metrics(rather app attempt metrics) like Vcore-seconds and MB-seconds are 
> saved in the state store when there is an attempt state transition. Values 
> for running attempts will be in memory and will not be saved when there is an 
> RM restart/failover. For recovered app metrics value will be reset. In that 
> case, these values will be incomplete. 
> Was this intentional or have we not found a correct way to fix it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4750) App metrics may not be correct when an app is recovered

2016-02-29 Thread Lavkesh Lahngir (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lavkesh Lahngir updated YARN-4750:
--
Summary: App metrics may not be correct when an app is recovered  (was: App 
metrics may not be correct when and app is recovered)

> App metrics may not be correct when an app is recovered
> ---
>
> Key: YARN-4750
> URL: https://issues.apache.org/jira/browse/YARN-4750
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>
> App metrics(rather app attempt metrics) like Vcore-seconds and MB-seconds are 
> saved in the state store when there is an attempt state transition. Values 
> for running attempts will be in memory and will not be saved when there is an 
> RM restart/failover. For recovered app metrics value will be reset. In that 
> case, the value will be incomplete. 
> Was this intentional or have we not found a correct way to fix it ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4750) App metrics may not be correct when and app is recovered

2016-02-29 Thread Lavkesh Lahngir (JIRA)
Lavkesh Lahngir created YARN-4750:
-

 Summary: App metrics may not be correct when and app is recovered
 Key: YARN-4750
 URL: https://issues.apache.org/jira/browse/YARN-4750
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir


App metrics(rather app attempt metrics) like Vcore-seconds and MB-seconds are 
saved in the state store when there is an attempt state transition. Values for 
running attempts will be in memory and will not be saved when there is an RM 
restart/failover. For recovered app metrics value will be reset. In that case, 
the value will be incomplete. 

Was this intentional or have we not found a correct way to fix it ?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173332#comment-15173332
 ] 

Wangda Tan commented on YARN-4734:
--

Thanks for looking at this, [~chris.douglas].

Yes this is a WIP patch, I was pulled into other stuffs that I haven't finished 
the whole patch. Will keep this JIRA updated.

For merge it at the top level, did you mean LICENSE.txt and BUILDING.txt? Are 
there any other files I need to change?

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4715) Add support to read resource types from a config file

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173327#comment-15173327
 ] 

Hadoop QA commented on YARN-4715:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
3s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
38s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 21s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 
228 unchanged - 3 fixed = 229 total (was 231) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 5s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflic

[jira] [Commented] (YARN-4719) Add a helper library to maintain node state and allows common queries

2016-02-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173319#comment-15173319
 ] 

Wangda Tan commented on YARN-4719:
--

Hi [~kasha],

Thanks for working on this patch, this is very useful.

Took a very quick look at the patch, few comments:
- For ClusterNodeTracker#nodes, can we use lock-free data structure to avoid 
copying the whole set?
- We'd better not add addBlacklistedNodeIdsToList to ClusterNodeTracker since 
it calls application's logic, we should only include node related stuffs to 
ClusterNodeTracker.

> Add a helper library to maintain node state and allows common queries
> -
>
> Key: YARN-4719
> URL: https://issues.apache.org/jira/browse/YARN-4719
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4719-1.patch, yarn-4719-2.patch
>
>
> The scheduler could use a helper library to maintain node state and allowing 
> matching/sorting queries. Several reasons for this:
> # Today, a lot of the node state management is done separately in each 
> scheduler. Having a single library will take us that much closer to reducing 
> duplication among schedulers.
> # Adding a filtering/matching API would simplify node labels and locality 
> significantly. 
> # An API that returns a sorted list for a custom comparator would help 
> YARN-1011 where we want to sort by allocation and utilization for 
> continuous/asynchronous and opportunistic scheduling respectively. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-02-29 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-4062:
-
Attachment: YARN-4062-YARN-2928.07.patch

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.04.patch, 
> YARN-4062-YARN-2928.05.patch, YARN-4062-YARN-2928.06.patch, 
> YARN-4062-YARN-2928.07.patch, YARN-4062-YARN-2928.1.patch, 
> YARN-4062-feature-YARN-2928.01.patch, YARN-4062-feature-YARN-2928.02.patch, 
> YARN-4062-feature-YARN-2928.03.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173303#comment-15173303
 ] 

Hadoop QA commented on YARN-4062:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 30s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
30s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
38s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 23s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 29s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 38s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 2 new + 
210 unchanged - 1 fixed = 212 total (was 211) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 36s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 26s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{col

[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-02-29 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173294#comment-15173294
 ] 

Sidharta Seethana commented on YARN-4744:
-

Thanks, I'll assign the issue to myself - I hope to be able to get to this soon.

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 9 more
> 2014-03-02 09:20:43,113 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=yarn 
> OPERATION=Container Finished - SucceededTARGET=ContainerImpl
> RESULT=SUC

[jira] [Assigned] (YARN-4744) Too many signal to container failure in case of LCE

2016-02-29 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana reassigned YARN-4744:
---

Assignee: Sidharta Seethana

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Sidharta Seethana
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 9 more
> 2014-03-02 09:20:43,113 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=yarn 
> OPERATION=Container Finished - SucceededTARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_1393731146548_0001
> CONTAINERI

[jira] [Updated] (YARN-4715) Add support to read resource types from a config file

2016-02-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4715:

Attachment: YARN-4715-YARN-3926.005.patch

Uploaded the an earlier patch by mistake. Attaching the right version.

> Add support to read resource types from a config file
> -
>
> Key: YARN-4715
> URL: https://issues.apache.org/jira/browse/YARN-4715
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4715-YARN-3926.001.patch, 
> YARN-4715-YARN-3926.002.patch, YARN-4715-YARN-3926.003.patch, 
> YARN-4715-YARN-3926.004.patch, YARN-4715-YARN-3926.005.patch
>
>
> This ticket is to add support to allow the RM to read the resource types to 
> be used for scheduling from a config file. I'll file follow up tickets to add 
> similar support in the NM as well as to handle the RM-NM handshake protocol 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4715) Add support to read resource types from a config file

2016-02-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4715:

Attachment: (was: YARN-4715-YARN-3926.004.patch)

> Add support to read resource types from a config file
> -
>
> Key: YARN-4715
> URL: https://issues.apache.org/jira/browse/YARN-4715
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4715-YARN-3926.001.patch, 
> YARN-4715-YARN-3926.002.patch, YARN-4715-YARN-3926.003.patch, 
> YARN-4715-YARN-3926.004.patch
>
>
> This ticket is to add support to allow the RM to read the resource types to 
> be used for scheduling from a config file. I'll file follow up tickets to add 
> similar support in the NM as well as to handle the RM-NM handshake protocol 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4715) Add support to read resource types from a config file

2016-02-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4715:

Attachment: YARN-4715-YARN-3926.004.patch

Uploaded a new patch with checkstyle fixes.

> Add support to read resource types from a config file
> -
>
> Key: YARN-4715
> URL: https://issues.apache.org/jira/browse/YARN-4715
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4715-YARN-3926.001.patch, 
> YARN-4715-YARN-3926.002.patch, YARN-4715-YARN-3926.003.patch, 
> YARN-4715-YARN-3926.004.patch
>
>
> This ticket is to add support to allow the RM to read the resource types to 
> be used for scheduling from a config file. I'll file follow up tickets to add 
> similar support in the NM as well as to handle the RM-NM handshake protocol 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-02-29 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-4062:
-
Attachment: YARN-4062-YARN-2928.06.patch

Uploading one more patch to fix checkstyle warnings

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.04.patch, 
> YARN-4062-YARN-2928.05.patch, YARN-4062-YARN-2928.06.patch, 
> YARN-4062-YARN-2928.1.patch, YARN-4062-feature-YARN-2928.01.patch, 
> YARN-4062-feature-YARN-2928.02.patch, YARN-4062-feature-YARN-2928.03.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4734) Merge branch:YARN-3368 to trunk

2016-02-29 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173255#comment-15173255
 ] 

Chris Douglas commented on YARN-4734:
-

{{LICENSE.txt}} looks like it is based on, or copied from Apache Tez. Could you 
double-check the set of modules to ensure it's correct for Hadoop? We'll also 
need to merge it at the top level.

> Merge branch:YARN-3368 to trunk
> ---
>
> Key: YARN-4734
> URL: https://issues.apache.org/jira/browse/YARN-4734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4734.1.patch, YARN-4734.2.patch, YARN-4734.3.patch
>
>
> YARN-2928 branch is planned to merge back to trunk shortly, it depends on 
> changes of YARN-3368. This JIRA is to track the merging task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-02-29 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173251#comment-15173251
 ] 

Bibin A Chundatt commented on YARN-4744:


Hi [~sidharta-s]
Thank you for looking into the issue.
# Is security enabled ? Yes
# Is this problem reproducible ? Yes always , submit mapreduce job from cli

{noformat}
2014-03-02 09:20:43,073 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_e02_1393731146548_0001_01_09
{noformat}
Container cleanup is getting called for containers already EXITED_WITH_SUCCESS 
too.

Update the description logs too.

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>
> Install HA cluster in secure mode
> Enable LCE with cgroups
> Start server with dsperf user
> Submit mapreduce application terasort/teragen with user yarn/dsperf 
> Too many signal to container failure 
> Submit with user the exception is thrown
> {noformat}
> 2014-03-02 09:20:38,689 INFO 
> SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
>  Authorization successful for testing (auth:TOKEN) for protocol=interface 
> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> 2014-03-02 09:20:40,158 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Event EventType: KILL_CONTAINER sent to absent container 
> container_e02_1393731146548_0001_01_13
> 2014-03-02 09:20:43,071 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Container container_e02_1393731146548_0001_01_09 succeeded
> 2014-03-02 09:20:43,072 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_1393731146548_0001_01_09 transitioned from 
> RUNNING to EXITED_WITH_SUCCESS
> 2014-03-02 09:20:43,073 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e02_1393731146548_0001_01_09
> 2014-03-02 09:20:43,075 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 9370, 15]
> 2014-03-02 09:20:43,081 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:11

[jira] [Updated] (YARN-4744) Too many signal to container failure in case of LCE

2016-02-29 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4744:
---
Description: 
Install HA cluster in secure mode
Enable LCE with cgroups
Start server with dsperf user
Submit mapreduce application terasort/teragen with user yarn/dsperf 
Too many signal to container failure 

Submit with user the exception is thrown

{noformat}
2014-03-02 09:20:38,689 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for testing (auth:TOKEN) for protocol=interface 
org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
2014-03-02 09:20:40,158 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Event EventType: KILL_CONTAINER sent to absent container 
container_e02_1393731146548_0001_01_13
2014-03-02 09:20:43,071 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Container container_e02_1393731146548_0001_01_09 succeeded
2014-03-02 09:20:43,072 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_e02_1393731146548_0001_01_09 transitioned from RUNNING 
to EXITED_WITH_SUCCESS
2014-03-02 09:20:43,073 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_e02_1393731146548_0001_01_09
2014-03-02 09:20:43,075 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
 Using container runtime: DefaultLinuxContainerRuntime
2014-03-02 09:20:43,081 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 9. Privileged Execution Operation Output:
main : command provided 2
main : run as user is yarn
main : requested yarn user is yarn
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
yarn, yarn, 2, 9370, 15]
2014-03-02 09:20:43,081 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
 Signal container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=9:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=9:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 9 more
2014-03-02 09:20:43,113 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=yarn 
OPERATION=Container Finished - SucceededTARGET=ContainerImpl
RESULT=SUCCESS  APPID=application_1393731146548_0001
CONTAINERID=container_e02_1393731146548_0001_01_09
2014-03-02 09:20:43,115 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_e02_1393731146548_0001_01_09 transitioned from 
EXITED_WITH_SUCCESS to DONE
2014-03-02 09:20:43,115 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Removing container_e02_1393731146548_0001_01_09 from application 
application_1393731146548_0001

{noformat}


Checked the same scenario in 2.7.2 version (not available)



  

[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173238#comment-15173238
 ] 

Hadoop QA commented on YARN-2883:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 39s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
37s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
41s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 6s 
{color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
58s {color} | {color:green} yarn-2877 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 46s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common in 
yarn-2877 has 3 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 11s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in yarn-2877 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 9s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 34s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 36 new + 
232 unchanged - 2 fixed = 268 total (was 234) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 33 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 5s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 34s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 5s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s 
{color} | {color:green} hadoop-yarn-api in 

[jira] [Updated] (YARN-4744) Too many signal to container failure in case of LCE

2016-02-29 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4744:
---
Description: 
Install HA cluster in secure mode
Enable LCE with cgroups
Start server with dsperf user
Submit application with user yarn
Too many signal to container failure 


{noformat}
2014-03-01 14:10:32,223 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
 Using container runtime: DefaultLinuxContainerRuntime
2014-03-01 14:10:32,228 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 9. Privileged Execution Operation Output:
main : command provided 2
main : run as user is yarn
main : requested yarn user is yarn
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
yarn, yarn, 2, 28575, 15]
2014-03-01 14:10:32,228 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
 Signal container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=9:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=9:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 9 more

{noformat}


Checked the same scenario in 2.7.2 version (not available)



  was:
Enable LCE with cgroups
Start server with dsperf user
Submit application with user yarn
Too many signal to container failure 

{noformat}
2014-03-01 14:10:32,223 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
 Using container runtime: DefaultLinuxContainerRuntime
2014-03-01 14:10:32,228 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 9. Privileged Execution Operation Output:
main : command provided 2
main : run as user is yarn
main : requested yarn user is yarn
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
yarn, yarn, 2, 28575, 15]
2014-03-01 14:10:32,228 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
 Signal container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=9:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.c

[jira] [Updated] (YARN-4744) Too many signal to container failure in case of LCE

2016-02-29 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4744:
---
Description: 
Install HA cluster in secure mode
Enable LCE with cgroups
Start server with dsperf user
Submit mapreduce application terasort/teragen with user yarn/dsperf 
Too many signal to container failure 

Submit with user the exception is thrown

{noformat}
2014-03-01 14:10:32,223 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
 Using container runtime: DefaultLinuxContainerRuntime
2014-03-01 14:10:32,228 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 9. Privileged Execution Operation Output:
main : command provided 2
main : run as user is yarn
main : requested yarn user is yarn
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
yarn, yarn, 2, 28575, 15]
2014-03-01 14:10:32,228 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
 Signal container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=9:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=9:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
at org.apache.hadoop.util.Shell.run(Shell.java:838)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
... 9 more

{noformat}


Checked the same scenario in 2.7.2 version (not available)



  was:
Install HA cluster in secure mode
Enable LCE with cgroups
Start server with dsperf user
Submit application with user yarn
Too many signal to container failure 


{noformat}
2014-03-01 14:10:32,223 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
 Using container runtime: DefaultLinuxContainerRuntime
2014-03-01 14:10:32,228 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 9. Privileged Execution Operation Output:
main : command provided 2
main : run as user is yarn
main : requested yarn user is yarn
Full command array for failed execution:
[/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor, 
yarn, yarn, 2, 28575, 15]
2014-03-01 14:10:32,228 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
 Signal container failed. Exception:
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
 ExitCodeException exitCode=9:
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecu

[jira] [Commented] (YARN-4749) Generalize config file handling in container-executor

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173176#comment-15173176
 ] 

Hadoop QA commented on YARN-4749:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
51s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 9s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 45s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 12s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790649/YARN-4749.001.patch |
| JIRA Issue | YARN-4749 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux b793f64df541 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d93c22e |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_72 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| JDK v1.7.0_95  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/10671/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10671/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Generalize config file handling in container-executor
> -
>
> Key: YARN-4749
> URL: https://issues.apache.org/jira/browse/YARN-4749
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>   

[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort

2016-02-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173162#comment-15173162
 ] 

Karthik Kambatla commented on YARN-4743:


[~gzh1992n] - thanks for reporting and working on this. I haven't had a chance 
to look at it closely enough. Will take me a couple of days to do so. 

On the surface, it seems benign to sort a snapshot of Schedulables. The other 
way would be to use ReadWriteLock in FSQueue: getters would all try to get a 
readLock while the sort holds the write lock? 

> ResourceManager crash because TimSort
> -
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Zephyr Guo
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>  at java.util.TimSort.mergeHi(TimSort.java:868)
>  at java.util.TimSort.mergeAt(TimSort.java:485)
>  at java.util.TimSort.mergeCollapse(TimSort.java:410)
>  at java.util.TimSort.sort(TimSort.java:214)
>  at java.util.TimSort.sort(TimSort.java:173)
>  at java.util.Arrays.sort(Arrays.java:659)
>  at java.util.Collections.sort(Collections.java:217)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting 
> {{runnableApps}}.
> {code:title=FSLeafQueue.java}
> Comparator comparator = policy.getComparator();
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ..
>   s1.getResourceUsage(), minShare1);
>   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
>   s2.getResourceUsage(), minShare2);
>   minShareRatio1 = (double) s1.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare1, 
> ONE).getMemory();
>   minShareRatio2 = (double) s2.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare2, 
> ONE).getMemory();
> ..
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is 
> unstable. 
> {code:title=FSAppAttempt.java}
> @Override
>   public Resource getResourceUsage() {
> // Here the getPreemptedResources() always return zero, except in
> // a preemption round
> return Resources.subtract(getCurrentConsumption(), 
> getPreemptedResources());
>   }
> {code}
> {code:title=SchedulerApplicationAttempt}
>  public Resource getCurrentConsumption() {
> return currentConsumption;
>   }
> // This method may modify current Resource.
> public synchronized void recoverContainer(RMContainer rmContainer) {
> ..
> Resources.addTo(currentConsumption, rmContainer.getContainer()
>   .getResource());
> ..
>   }
> {code}
> I suggest that use stable Resource in comparator.
> Is there something i think wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173158#comment-15173158
 ] 

Hadoop QA commented on YARN-4517:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 34s 
{color} | {color:red} Patch generated 95 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 1m 21s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790648/YARN-4517-YARN-3368.02.patch
 |
| JIRA Issue | YARN-4517 |
| Optional Tests |  asflicense  |
| uname | Linux 4b7d3a22c9be 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-3368 / 37455e7 |
| asflicense | 
https://builds.apache.org/job/PreCommit-YARN-Build/10672/artifact/patchprocess/patch-asflicense-problems.txt
 |
| modules | C: hadoop-yarn-project/hadoop-yarn U: 
hadoop-yarn-project/hadoop-yarn |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10672/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort

2016-02-29 Thread Zephyr Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173146#comment-15173146
 ] 

Zephyr Guo commented on YARN-4743:
--

{quote}
I think that DRF comparator is not transitive with my intuition.
{quote}
I think that's right.[~ozawa]

FairShareComparator uses {{getResourceUsage()}} and {{getDemand()}} and 
{{getMinShare()}} to implement {{compare(Schedulable s1, Schedulable s1)}}.The 
three methods must return same Resource anyway while we are sorting, otherwise 
will break transitivity.

How about add snapshot feature in Schedulable? We snapshot Schedulable before 
sorting.Then we sort but use snapshot Resource in comparator . Result of 
sorting will very close to real situation, because sorting is very fast.

> ResourceManager crash because TimSort
> -
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Zephyr Guo
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>  at java.util.TimSort.mergeHi(TimSort.java:868)
>  at java.util.TimSort.mergeAt(TimSort.java:485)
>  at java.util.TimSort.mergeCollapse(TimSort.java:410)
>  at java.util.TimSort.sort(TimSort.java:214)
>  at java.util.TimSort.sort(TimSort.java:173)
>  at java.util.Arrays.sort(Arrays.java:659)
>  at java.util.Collections.sort(Collections.java:217)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting 
> {{runnableApps}}.
> {code:title=FSLeafQueue.java}
> Comparator comparator = policy.getComparator();
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ..
>   s1.getResourceUsage(), minShare1);
>   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
>   s2.getResourceUsage(), minShare2);
>   minShareRatio1 = (double) s1.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare1, 
> ONE).getMemory();
>   minShareRatio2 = (double) s2.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare2, 
> ONE).getMemory();
> ..
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is 
> unstable. 
> {code:title=FSAppAttempt.java}
> @Override
>   public Resource getResourceUsage() {
> // Here the getPreemptedResources() always return zero, except in
> // a preemption round
> return Resources.subtract(getCurrentConsumption(), 
> getPreemptedResources());
>   }
> {code}
> {code:title=SchedulerApplicationAttempt}
>  public Resource getCurrentConsumption() {
> return currentConsumption;
>   }
> // This method may modify current Resource.
> public synchronized void recoverContainer(RMContainer rmContainer) {
> ..
> Resources.addTo(currentConsumption, rmContainer.getContainer()
>   .getResource());
> ..
>   }
> {code}
> I suggest that use stable Resource in comparator.
> Is there something i think wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4719) Add a helper library to maintain node state and allows common queries

2016-02-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173137#comment-15173137
 ] 

Karthik Kambatla commented on YARN-4719:


bq. May also be useful to expose functionality in the ClusterNodeTracker to 
give list of nodes in a rack, nodes that match a label expression etc. (This 
can possibly be another JIRA too)
Absolutely. I wanted to move all existing common functionality into this class 
in this JIRA, so we can add other helper functionality in the future. 

bq. I see that you are triggering the update thread on nodeRemoval too. I 
understand this might generally be useful (since the node removal might change 
the node ordering), but given this is a refactoring patch, maybe address that 
separately ?
removeNode does a triggerUpdate today too. I just moved it a little. 

Will fix the import and the test failures here in the next iteration. 

> Add a helper library to maintain node state and allows common queries
> -
>
> Key: YARN-4719
> URL: https://issues.apache.org/jira/browse/YARN-4719
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4719-1.patch, yarn-4719-2.patch
>
>
> The scheduler could use a helper library to maintain node state and allowing 
> matching/sorting queries. Several reasons for this:
> # Today, a lot of the node state management is done separately in each 
> scheduler. Having a single library will take us that much closer to reducing 
> duplication among schedulers.
> # Adding a filtering/matching API would simplify node labels and locality 
> significantly. 
> # An API that returns a sorted list for a custom comparator would help 
> YARN-1011 where we want to sort by allocation and utilization for 
> continuous/asynchronous and opportunistic scheduling respectively. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4749) Generalize config file handling in container-executor

2016-02-29 Thread Sidharta Seethana (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-4749:

Attachment: YARN-4749.001.patch

Uploaded a patch that makes config parsing reusable. 

> Generalize config file handling in container-executor
> -
>
> Key: YARN-4749
> URL: https://issues.apache.org/jira/browse/YARN-4749
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-4749.001.patch
>
>
> The current implementation of container-executor already supports parsing of 
> key value pairs from a config file. However, it is currently restricted to 
> {{container-executor.cfg}} and cannot be reused for parsing additional 
> config/command files. Generalizing this is a required step for YARN-4245.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4517) [YARN-3368] Add nodes page

2016-02-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4517:
---
Attachment: (was: YARN-4517-YARN-3368.02.patch)

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4517) [YARN-3368] Add nodes page

2016-02-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4517:
---
Attachment: YARN-4517-YARN-3368.02.patch

Sorry I had left my local host and port configurations in the patch. Updating 
the patch after removing them.

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-02-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173124#comment-15173124
 ] 

Varun Saxena commented on YARN-4746:


[~bibinchundatt], although its not mentioned as this part of this JIRA but I 
think we can extend this check to app attempt id and container id as well. 

> yarn web services should convert parse failures of appId to 400
> ---
>
> Key: YARN-4746
> URL: https://issues.apache.org/jira/browse/YARN-4746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: 0001-YARN-4746.patch
>
>
> I'm seeing somewhere in the WS API tests of mine an error with exception 
> conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
> ATS, but a scan of the core RM web services implies a same problem
> {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} 
> to convert an argument; this throws IllegalArgumentException, which is then 
> handled somewhere by jetty as a 500 error.
> In fact, it's a bad argument, which should be handled by returning a 400. 
> This can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4749) Generalize config file handling in container-executor

2016-02-29 Thread Sidharta Seethana (JIRA)
Sidharta Seethana created YARN-4749:
---

 Summary: Generalize config file handling in container-executor
 Key: YARN-4749
 URL: https://issues.apache.org/jira/browse/YARN-4749
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Sidharta Seethana
Assignee: Sidharta Seethana


The current implementation of container-executor already supports parsing of 
key value pairs from a config file. However, it is currently restricted to 
{{container-executor.cfg}} and cannot be reused for parsing additional 
config/command files. Generalizing this is a required step for YARN-4245.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4748) ApplicationHistoryManagerOnTimelineStore should not swallow exceptions on generateApplicationReport

2016-02-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173114#comment-15173114
 ] 

Hudson commented on YARN-4748:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9397 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9397/])
YARN-4748. ApplicationHistoryManagerOnTimelineStore should not swallow (jianhe: 
rev d93c22ec274b1a0f29609217039b80732886fed7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java


> ApplicationHistoryManagerOnTimelineStore should not swallow exceptions on 
> generateApplicationReport
> ---
>
> Key: YARN-4748
> URL: https://issues.apache.org/jira/browse/YARN-4748
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Fix For: 2.8.0
>
> Attachments: YARN-4748-trunk.001.patch
>
>
> We're directly swallowing AuthorizationExceptions and 
> ApplicationAttemptNotFoundExceptions when generating application reports. we 
> should at least mark down the exception before proceed with default values 
> (which will assign app attempt id to -1). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-02-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173112#comment-15173112
 ] 

Sangjin Lee commented on YARN-3863:
---

Thanks for the detailed explanation of the changes [~varun_saxena]! It's 
tremendously helpful. I'll go over the latest patch, and get back to you with 
comments.

{quote}
I could not quite get below comment. I did not make any change on line 448. 
Sangjin, can you elaborate. Maybe you meant some other line.
(HBaseTimelineWriterImpl.java)
l.448: it should simply be a else if
{quote}

Sorry I had meant {{TimelineStorageUtils.java}}. I see it's significantly 
refactored in the latest version, so I'm pretty sure it doesn't apply.


> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-YARN-2928.v2.03.patch, 
> YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4748) ApplicationHistoryManagerOnTimelineStore should not swallow exceptions on generateApplicationReport

2016-02-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173109#comment-15173109
 ] 

Jian He commented on YARN-4748:
---

+1

> ApplicationHistoryManagerOnTimelineStore should not swallow exceptions on 
> generateApplicationReport
> ---
>
> Key: YARN-4748
> URL: https://issues.apache.org/jira/browse/YARN-4748
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Fix For: 2.8.0
>
> Attachments: YARN-4748-trunk.001.patch
>
>
> We're directly swallowing AuthorizationExceptions and 
> ApplicationAttemptNotFoundExceptions when generating application reports. we 
> should at least mark down the exception before proceed with default values 
> (which will assign app attempt id to -1). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-02-29 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4746:
---
Attachment: 0001-YARN-4746.patch

> yarn web services should convert parse failures of appId to 400
> ---
>
> Key: YARN-4746
> URL: https://issues.apache.org/jira/browse/YARN-4746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: 0001-YARN-4746.patch
>
>
> I'm seeing somewhere in the WS API tests of mine an error with exception 
> conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
> ATS, but a scan of the core RM web services implies a same problem
> {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} 
> to convert an argument; this throws IllegalArgumentException, which is then 
> handled somewhere by jetty as a 500 error.
> In fact, it's a bad argument, which should be handled by returning a 400. 
> This can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4719) Add a helper library to maintain node state and allows common queries

2016-02-29 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173101#comment-15173101
 ] 

Arun Suresh commented on YARN-4719:
---

Much needed patch [~kasha]..

Took a quick look, some comments
# You have an un-used import in the {{FairScheduler}}
# I see that you are triggering the update thread on nodeRemoval too. I 
understand this might generally be useful (since the node removal might change 
the node ordering), but given this is a refactoring patch, maybe address that 
separately ?
# May also be useful to expose functionality in the {{ClusterNodeTracker}} to 
give list of nodes in a rack, nodes that match a label expression etc. (This 
can possibly be another JIRA too)

> Add a helper library to maintain node state and allows common queries
> -
>
> Key: YARN-4719
> URL: https://issues.apache.org/jira/browse/YARN-4719
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4719-1.patch, yarn-4719-2.patch
>
>
> The scheduler could use a helper library to maintain node state and allowing 
> matching/sorting queries. Several reasons for this:
> # Today, a lot of the node state management is done separately in each 
> scheduler. Having a single library will take us that much closer to reducing 
> duplication among schedulers.
> # Adding a filtering/matching API would simplify node labels and locality 
> significantly. 
> # An API that returns a sorted list for a custom comparator would help 
> YARN-1011 where we want to sort by allocation and utilization for 
> continuous/asynchronous and opportunistic scheduling respectively. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173066#comment-15173066
 ] 

Hadoop QA commented on YARN-4517:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 26s 
{color} | {color:red} Patch generated 95 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 1m 12s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790627/YARN-4517-YARN-3368.02.patch
 |
| JIRA Issue | YARN-4517 |
| Optional Tests |  asflicense  |
| uname | Linux 024acc4740d3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-3368 / 37455e7 |
| asflicense | 
https://builds.apache.org/job/PreCommit-YARN-Build/10669/artifact/patchprocess/patch-asflicense-problems.txt
 |
| modules | C: hadoop-yarn-project/hadoop-yarn U: 
hadoop-yarn-project/hadoop-yarn |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10669/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2883) Queuing of container requests in the NM

2016-02-29 Thread Konstantinos Karanasos (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Karanasos updated YARN-2883:
-
Attachment: YARN-2883-yarn-2877.002.patch

Thanks [~asuresh] for the feedback.

I am attaching a new version of the patch in which I have applied all your 
comments.
Most notably, among them: 
* Added the {{QueuingContainerManagerImpl}} and 
{{QueuingContainersMonitorImpl}} as sub-classes to {{ContainerManagerImpl}} and 
{{ContainersMonitorImpl}}, respectively.
* Created a {{QueuingNMContext}} within the {{NMContext}} of the 
{{NodeManager}} to hold the queuedContainers and the killedQueuedContainers.
* Used "allocated" instead of "logical" in all class/field/method names.
* Also implemented the methods that send the number of queued containers from 
the NM to the RM through the {{NodeStatusUpdaterImpl}}.

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-yarn-2877.001.patch, 
> YARN-2883-yarn-2877.002.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-02-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173026#comment-15173026
 ] 

Varun Saxena commented on YARN-4517:


Screenshots have been attached too...

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4517) [YARN-3368] Add nodes page

2016-02-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173022#comment-15173022
 ] 

Varun Saxena commented on YARN-4517:


Attached a new patch. 
[~leftnoteasy], kindly review.
The open points mentioned above have been fixed(except adding additional graphs 
which can be done in another JIRA).
As YARN-4709 has gone in, changes have been done according to them. This patch 
hence is on top of changes in YARN-4709.
The overflow on screen resizing on left hand side menu bar has been fixed as 
well.
Moreover, I have added multiple unit test cases.

Refer to 
https://issues.apache.org/jira/browse/YARN-4517?focusedCommentId=15155987&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15155987
 to check what has been done in the JIRA.

[~leftnoteasy], do you want me to raise multiple JIRAs' and break this patch ?

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4517) [YARN-3368] Add nodes page

2016-02-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4517:
---
Attachment: YARN-4517-YARN-3368.02.patch

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch, YARN-4517-YARN-3368.02.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4704) TestResourceManager#testResourceAllocation() fails when using FairScheduler

2016-02-29 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173014#comment-15173014
 ] 

Yufei Gu commented on YARN-4704:


Hi, [~kasha] Thank you very much for the code reviewing. 

> TestResourceManager#testResourceAllocation() fails when using FairScheduler
> ---
>
> Key: YARN-4704
> URL: https://issues.apache.org/jira/browse/YARN-4704
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: fairscheduler, test
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Yufei Gu
> Fix For: 2.9.0
>
> Attachments: YARN-4704.001.patch
>
>
> When using FairScheduler, TestResourceManager#testResourceAllocation() fails 
> with the following error:
> java.lang.IllegalStateException: Trying to stop a non-running task: 1 of 
> application application_1455833410011_0001
> at 
> org.apache.hadoop.yarn.server.resourcemanager.Task.stop(Task.java:117)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.Application.finishTask(Application.java:266)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager.testResourceAllocation(TestResourceManager.java:167)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4517) [YARN-3368] Add nodes page

2016-02-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4517:
---
Attachment: Screenshot_after_4709_1.png
Screenshot_after_4709.png

> [YARN-3368] Add nodes page
> --
>
> Key: YARN-4517
> URL: https://issues.apache.org/jira/browse/YARN-4517
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Wangda Tan
>Assignee: Varun Saxena
>  Labels: webui
> Attachments: (21-Feb-2016)yarn-ui-screenshots.zip, 
> Screenshot_after_4709.png, Screenshot_after_4709_1.png, 
> YARN-4517-YARN-3368.01.patch
>
>
> We need nodes page added to next generation web UI, similar to existing 
> RM/nodes page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4704) TestResourceManager#testResourceAllocation() fails when using FairScheduler

2016-02-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172997#comment-15172997
 ] 

Hudson commented on YARN-4704:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9396 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9396/])
YARN-4704. TestResourceManager#testResourceAllocation() fails when using 
(kasha: rev 9dafaaaf0de68ce7f5e495ea4b8e0ce036dc35a2)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* hadoop-yarn-project/CHANGES.txt


> TestResourceManager#testResourceAllocation() fails when using FairScheduler
> ---
>
> Key: YARN-4704
> URL: https://issues.apache.org/jira/browse/YARN-4704
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: fairscheduler, test
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Yufei Gu
> Fix For: 2.9.0
>
> Attachments: YARN-4704.001.patch
>
>
> When using FairScheduler, TestResourceManager#testResourceAllocation() fails 
> with the following error:
> java.lang.IllegalStateException: Trying to stop a non-running task: 1 of 
> application application_1455833410011_0001
> at 
> org.apache.hadoop.yarn.server.resourcemanager.Task.stop(Task.java:117)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.Application.finishTask(Application.java:266)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager.testResourceAllocation(TestResourceManager.java:167)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4704) TestResourceManager#testResourceAllocation() fails when using FairScheduler

2016-02-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172933#comment-15172933
 ] 

Karthik Kambatla commented on YARN-4704:


Verified the patch fixes the test with FairScheduler. And, the test failures 
reported by Jenkins are known and unrelated. +1, checking this in. 

> TestResourceManager#testResourceAllocation() fails when using FairScheduler
> ---
>
> Key: YARN-4704
> URL: https://issues.apache.org/jira/browse/YARN-4704
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: fairscheduler, test
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Yufei Gu
> Attachments: YARN-4704.001.patch
>
>
> When using FairScheduler, TestResourceManager#testResourceAllocation() fails 
> with the following error:
> java.lang.IllegalStateException: Trying to stop a non-running task: 1 of 
> application application_1455833410011_0001
> at 
> org.apache.hadoop.yarn.server.resourcemanager.Task.stop(Task.java:117)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.Application.finishTask(Application.java:266)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager.testResourceAllocation(TestResourceManager.java:167)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4747) AHS error 500 due to NPE when container start event is missing

2016-02-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-4747:
--

Assignee: Varun Saxena

> AHS error 500 due to NPE when container start event is missing
> --
>
> Key: YARN-4747
> URL: https://issues.apache.org/jira/browse/YARN-4747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>
> Saw an error 500 due to a NullPointerException caused by a missing host for 
> an AM container.  Stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4748) ApplicationHistoryManagerOnTimelineStore should not swallow exceptions on generateApplicationReport

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172901#comment-15172901
 ] 

Hadoop QA commented on YARN-4748:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
7s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 46s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 54s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 56s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790597/YARN-4748-trunk.001.patch
 |
| JIRA Issue | YARN-4748 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 47095768758f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| 

[jira] [Commented] (YARN-4704) TestResourceManager#testResourceAllocation() fails when using FairScheduler

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172861#comment-15172861
 ] 

Hadoop QA commented on YARN-4704:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 2s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 20s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 149m 19s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790557/YARN-4704.001.patch |
| JIRA Issue | YARN-4704 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  c

[jira] [Updated] (YARN-4748) ApplicationHistoryManagerOnTimelineStore should not swallow exceptions on generateApplicationReport

2016-02-29 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-4748:

Attachment: YARN-4748-trunk.001.patch

Quick patch to log exceptions when generating application reports. 

> ApplicationHistoryManagerOnTimelineStore should not swallow exceptions on 
> generateApplicationReport
> ---
>
> Key: YARN-4748
> URL: https://issues.apache.org/jira/browse/YARN-4748
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: YARN-4748-trunk.001.patch
>
>
> We're directly swallowing AuthorizationExceptions and 
> ApplicationAttemptNotFoundExceptions when generating application reports. we 
> should at least mark down the exception before proceed with default values 
> (which will assign app attempt id to -1). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172800#comment-15172800
 ] 

Hadoop QA commented on YARN-4696:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
22s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 17s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 27s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage
 in trunk has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
3s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 34s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 3 new + 
29 unchanged - 0 fixed = 32 total (was 29) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 25s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 3s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 9s 
{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s 
{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch 
passed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 17s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 

[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-02-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172795#comment-15172795
 ] 

Ted Yu commented on YARN-4736:
--

bq. so planning to test with hbase-1.0.3 tar. 

There have been more release(s) since 1.0.3 release.
e.g. you can try out 1.2.0 release.

BufferedMutatorImpl#flush() appeared in stack trace. However, if the hbase 
cluster was shutdown, the flush wouldn't succeed.

I haven't seen the above issue happen on a live 1.x cluster.

> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Vrushali C
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: hbaseException.log, threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-02-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172779#comment-15172779
 ] 

Sangjin Lee commented on YARN-4736:
---

{quote}
Was checking with our hbase team from the logs and the trace they were 
informing that the issue might be due to bad connectivity with the zookeeper, 
but it strange to see that in the local node setup. So i suspect that there is 
some issue with my hbase setup, so planning to test with hbase-1.0.3 tar.
{quote}

Regardless of what happens on the cluster, the client (NM in this case) should 
not lock up. So in that sense, I think the "deadlock" we're seeing should be 
looked at from the client-side. [~te...@apache.org], do you recall seeing any 
issues like this?

{quote}
But also would like to know whether you guys are facing the same issue or its 
only me who is facing it ?
{quote}

I haven't had a chance to try to reproduce it yet. I'll try it as soon as 
feasible. Does this happen every time you run the timeline service performance 
test with the latest branch?

> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Vrushali C
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: hbaseException.log, threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4748) ApplicationHistoryManagerOnTimelineStore should not swallow exceptions on generateApplicationReport

2016-02-29 Thread Li Lu (JIRA)
Li Lu created YARN-4748:
---

 Summary: ApplicationHistoryManagerOnTimelineStore should not 
swallow exceptions on generateApplicationReport
 Key: YARN-4748
 URL: https://issues.apache.org/jira/browse/YARN-4748
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Li Lu
Assignee: Li Lu


We're directly swallowing AuthorizationExceptions and 
ApplicationAttemptNotFoundExceptions when generating application reports. we 
should at least mark down the exception before proceed with default values 
(which will assign app attempt id to -1). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4696) EntityGroupFSTimelineStore to work in the absence of an RM

2016-02-29 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4696:
-
Attachment: YARN-4696-009.patch

This is the 009 patch; the difference with 008 is it is correctly converting 
IllegalArgumentException to a BadRequestException with the nested stack trace.

With this patch applied with the current YARN-4545 patch, I now successfully 
have
# all tests against completed jobs working with file://
# tests needing to track incomplete jobs working with an HDFS minicluster.

LocalFS isn't going to work as a destination for incomplete jobs, as it doesn't 
flush(). Nor will things like S3. That'll need documenting


> EntityGroupFSTimelineStore to work in the absence of an RM
> --
>
> Key: YARN-4696
> URL: https://issues.apache.org/jira/browse/YARN-4696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-4696-001.patch, YARN-4696-002.patch, 
> YARN-4696-003.patch, YARN-4696-005.patch, YARN-4696-006.patch, 
> YARN-4696-007.patch, YARN-4696-008.patch, YARN-4696-009.patch
>
>
> {{EntityGroupFSTimelineStore}} now depends on an RM being up and running; the 
> configuration pointing to it. This is a new change, and impacts testing where 
> you have historically been able to test without an RM running.
> The sole purpose of the probe is to automatically determine if an app is 
> running; it falls back to "unknown" if not. If the RM connection was 
> optional, the "unknown" codepath could be called directly, relying on age of 
> file as a metric of completion
> Options
> # add a flag to disable RM connect
> # skip automatically if RM not defined/set to 0.0.0.0
> # disable retries on yarn client IPC; if it fails, tag app as unknown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4747) AHS error 500 due to NPE when container start event is missing

2016-02-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172671#comment-15172671
 ] 

Jason Lowe commented on YARN-4747:
--

I believe this was triggered by a missing container start event for a given 
container finish event.  When an application runs for a long time there will be 
a corresponding long window between the container start event and container 
finish event for the AM container.  The timelineserver performs retention based 
on entity timestamp, so there will be a long window where the container start 
event has been deleted but the container finish event is still present.  The 
application history code is not prepared to handle that, as only the container 
start event has the node hostname and port number information.  It blindly 
assumes that if a container entity is present in the database then we know both 
the host and the port.

Minimally the application history server needs to be hardened to avoid the NPE, 
but we may want to add the host and port information to the finish event as 
well to allow the history page to continue to provide logs as long as there is 
either a container start or container finish event in the database.

> AHS error 500 due to NPE when container start event is missing
> --
>
> Key: YARN-4747
> URL: https://issues.apache.org/jira/browse/YARN-4747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>
> Saw an error 500 due to a NullPointerException caused by a missing host for 
> an AM container.  Stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4747) AHS error 500 due to NPE when container start event is missing

2016-02-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172665#comment-15172665
 ] 

Jason Lowe commented on YARN-4747:
--

Stacktrace:
{noformat}
2016-02-29 16:50:19,465 [1866296659@qtp-46415544-16798] ERROR webapp.AppBlock: 
Failed to read the AM container of the application attempt 
appattempt_1455753632268_408876_01.
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.proto.YarnProtos$NodeIdProto$Builder.setHost(YarnProtos.java:19772)
at 
org.apache.hadoop.yarn.api.records.impl.pb.NodeIdPBImpl.setHost(NodeIdPBImpl.java:56)
at org.apache.hadoop.yarn.api.records.NodeId.newInstance(NodeId.java:42)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.convertToContainerReport(ApplicationHistoryManagerOnTimelineStore.java:529)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getContainer(ApplicationHistoryManagerOnTimelineStore.java:200)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:200)
at 
org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:249)
at 
org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:243)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
at 
org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:242)
at 
org.apache.hadoop.yarn.server.webapp.AppBlock.render(AppBlock.java:217)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.AHSController.app(AHSController.java:38)
at sun.reflect.GeneratedMethodAccessor90.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
{noformat}

> AHS error 500 due to NPE when container start event is missing
> --
>
> Key: YARN-4747
> URL: https://issues.apache.org/jira/browse/YARN-4747
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.7.2
>Reporter: Jason Lowe
>
> Saw an error 500 due to a NullPointerException caused by a missing host for 
> an AM container.  Stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4747) AHS error 500 due to NPE when container start event is missing

2016-02-29 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-4747:


 Summary: AHS error 500 due to NPE when container start event is 
missing
 Key: YARN-4747
 URL: https://issues.apache.org/jira/browse/YARN-4747
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.7.2
Reporter: Jason Lowe


Saw an error 500 due to a NullPointerException caused by a missing host for an 
AM container.  Stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4593) Deadlock in AbstractService.getConfig()

2016-02-29 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172652#comment-15172652
 ] 

Steve Loughran commented on YARN-4593:
--

No tests. I'd have to think of a way to recreate the deadlock then make sure it 
was gone. someone will need to look at the code instead

> Deadlock in AbstractService.getConfig()
> ---
>
> Key: YARN-4593
> URL: https://issues.apache.org/jira/browse/YARN-4593
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.2
> Environment: AM restarting on kerberized cluster
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-4593-001.patch
>
>
> SLIDER-1052 has found a deadlock which can arise in it during AM restart. 
> Looking at the thread trace, one of the blockages is actually 
> {{AbstractService.getConfig()}} —this is synchronized and so blocked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-29 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172651#comment-15172651
 ] 

Vrushali C commented on YARN-4700:
--

Hi [~Naganarasimha]

Thanks for the patch. I believe the constructor for FlowActivityRowKey should 
change to correctly calculate top of the day timestamp given the input 
timestamp. That is the reason the unit test is failing I think, since the 
FlowActivityRowKey is constructed with 
FlowActivityRowKey.getRowKey(clusterStop, appCreatedTime, user, flow). 

Also,I think we can remove the  function FlowActivityRowKey #getRowKey(String 
clusterId, String userId, String  * flowName) and only keep the 
FlowActivityRowKey# getRowKey(String clusterId, long dayTs, String userId, 
String flowName) . That way it's easier to clean up the unit tests as well.

And I think you can change the unit test to use different timestamps (but keep 
the same semantics, i.e. min start time should actually be the lowest one etc), 
that way it will be easier to refactor the unit test. Let me know if this 
helps. Right now the unit test checks in the flow activity table that one entry 
has been made for all of these 4 application entities so you can use the 
timestamps that belong to exactly the same day. Or if you use timestamps 
belonging to different days, change the test to look for those many entries.

Another thing is that, it looks like the event timestamp that is being used is 
the timelineEvents.next().getTimestamp(). It might be more explicit to fetch 
the exact created (or finished) event from the TimelineEntity and use the 
timestamp that belong to either ApplicationMetricsConstants.CREATED_EVENT_TYPE 
or ApplicationMetricsConstants.FINISHED_EVENT_TYPE. That way, we are using the 
accurate event time to make an entry into the flow activity table. You can use 
TimelineStorageUtils # getApplicationFinishedTime() function for getting the 
timestamp for the FINISHED event. You would have to write a new function to do 
a similar thing for fetching CREATED event timestamp (or refactor further and 
use the same function to get the right event's timestamp).

Hope this helps.. Let me know..

thanks
Vrushali

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4700-YARN-2928.v1.001.patch, 
> YARN-4700-YARN-2928.wip.patch
>
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4741) RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event queue

2016-02-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172650#comment-15172650
 ] 

Sangjin Lee commented on YARN-4741:
---

I attached the node manager log. It's pretty much the entirety of the log from 
the start until after it's past the point of these events happening for this 
node in the RM. The only thing I removed is a section early in the log that 
lists all the localization service recovering files.

Unfortunately I no longer have the RM log for this episode.

We do not have YARN-3990 or YARN-3896 applied. Although we should get them in 
any case, I'm not sure if those are related to the issue we're seeing.

> RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async 
> dispatcher event queue
> ---
>
> Key: YARN-4741
> URL: https://issues.apache.org/jira/browse/YARN-4741
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sangjin Lee
>Priority: Critical
> Attachments: nm.log
>
>
> We had a pretty major incident with the RM where it was continually flooded 
> with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event 
> queue.
> In our setup, we had the RM HA or stateful restart *disabled*, but NM 
> work-preserving restart *enabled*. Due to other issues, we did a cluster-wide 
> NM restart.
> Some time during the restart (which took multiple hours), we started seeing 
> the async dispatcher event queue building. Normally it would log 1,000. In 
> this case, it climbed all the way up to tens of millions of events.
> When we looked at the RM log, it was full of the following messages:
> {noformat}
> 2016-02-18 01:47:29,530 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid 
> event FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
> 2016-02-18 01:47:29,535 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Can't handle 
> this event at current state
> 2016-02-18 01:47:29,535 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid 
> event FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
> 2016-02-18 01:47:29,538 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Can't handle 
> this event at current state
> 2016-02-18 01:47:29,538 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid 
> event FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
> {noformat}
> And that node in question was restarted a few minutes earlier.
> When we inspected the RM heap, it was full of 
> RMNodeFinishedContainersPulledByAMEvents.
> Suspecting the NM work-preserving restart, we disabled it and did another 
> cluster-wide rolling restart. Initially that seemed to have helped reduce the 
> queue size, but the queue built back up to several millions and continued for 
> an extended period. We had to restart the RM to resolve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4741) RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event queue

2016-02-29 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4741:
--
Attachment: nm.log

> RM is flooded with RMNodeFinishedContainersPulledByAMEvents in the async 
> dispatcher event queue
> ---
>
> Key: YARN-4741
> URL: https://issues.apache.org/jira/browse/YARN-4741
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sangjin Lee
>Priority: Critical
> Attachments: nm.log
>
>
> We had a pretty major incident with the RM where it was continually flooded 
> with RMNodeFinishedContainersPulledByAMEvents in the async dispatcher event 
> queue.
> In our setup, we had the RM HA or stateful restart *disabled*, but NM 
> work-preserving restart *enabled*. Due to other issues, we did a cluster-wide 
> NM restart.
> Some time during the restart (which took multiple hours), we started seeing 
> the async dispatcher event queue building. Normally it would log 1,000. In 
> this case, it climbed all the way up to tens of millions of events.
> When we looked at the RM log, it was full of the following messages:
> {noformat}
> 2016-02-18 01:47:29,530 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid 
> event FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
> 2016-02-18 01:47:29,535 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Can't handle 
> this event at current state
> 2016-02-18 01:47:29,535 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid 
> event FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
> 2016-02-18 01:47:29,538 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Can't handle 
> this event at current state
> 2016-02-18 01:47:29,538 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Invalid 
> event FINISHED_CONTAINERS_PULLED_BY_AM on Node  worker-node-foo.bar.net:8041
> {noformat}
> And that node in question was restarted a few minutes earlier.
> When we inspected the RM heap, it was full of 
> RMNodeFinishedContainersPulledByAMEvents.
> Suspecting the NM work-preserving restart, we disabled it and did another 
> cluster-wide rolling restart. Initially that seemed to have helped reduce the 
> queue size, but the queue built back up to several millions and continued for 
> an extended period. We had to restart the RM to resolve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4744) Too many signal to container failure in case of LCE

2016-02-29 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172629#comment-15172629
 ] 

Sidharta Seethana commented on YARN-4744:
-

Hi [~bibinchundatt],

Could you provide some additional information here : Is security enabled ? Is 
this problem reproducible with included apps - e.g distributed shell ? Is it 
possible the container exited before the signal was delivered (exit code 9 is 
possible in this scenario) ?

thanks,
-Sidharta

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>
> Enable LCE with cgroups
> Start server with dsperf user
> Submit application with user yarn
> Too many signal to container failure 
> {noformat}
> 2014-03-01 14:10:32,223 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-01 14:10:32,228 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 28575, 15]
> 2014-03-01 14:10:32,228 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 9 more
> {noformat}
> Checked the same scenario in 2.7.2 version (not available)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172617#comment-15172617
 ] 

Hadoop QA commented on YARN-4700:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
56s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
38s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 4s {color} | 
{color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 3m 2s {color} | 
{color:red} hadoop-yarn-server-timelineservice in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 34s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage
 |
|   | 
hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage
 |
|   | 
hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790555/YAR

[jira] [Updated] (YARN-4704) TestResourceManager#testResourceAllocation() fails when using FairScheduler

2016-02-29 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-4704:
---
Attachment: YARN-4704.001.patch

> TestResourceManager#testResourceAllocation() fails when using FairScheduler
> ---
>
> Key: YARN-4704
> URL: https://issues.apache.org/jira/browse/YARN-4704
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: fairscheduler, test
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Yufei Gu
> Attachments: YARN-4704.001.patch
>
>
> When using FairScheduler, TestResourceManager#testResourceAllocation() fails 
> with the following error:
> java.lang.IllegalStateException: Trying to stop a non-running task: 1 of 
> application application_1455833410011_0001
> at 
> org.apache.hadoop.yarn.server.resourcemanager.Task.stop(Task.java:117)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.Application.finishTask(Application.java:266)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager.testResourceAllocation(TestResourceManager.java:167)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-29 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4700:

Attachment: YARN-4700-YARN-2928.v1.001.patch

Hi [~vrushalic], [~sjlee0] & [~varun_saxena],
Please find the attached patch which tries to consider for the app created and 
as well as app finished event updating the *FlowActivityTable*
Only concern is, may be i did not get the complete essence of  
{{TestHBaseStorageFlowActivity.testWriteFlowRunMinMax}} testcase hence its 
failing, IIUC there should be 4 entries in FlowActivity table as these are 4 
diff apps of the same flow right ? correct me if i am wrong.



> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4700-YARN-2928.v1.001.patch, 
> YARN-4700-YARN-2928.wip.patch
>
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4593) Deadlock in AbstractService.getConfig()

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172472#comment-15172472
 ] 

Hadoop QA commented on YARN-4593:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 1s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 55s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
43s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 56s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 59s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
48s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 10s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 35s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
22s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 53s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782366/YARN-4593-001.patch |
| JIRA Issue | YARN-4593 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux be0b5e7d5f34 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provid

[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172453#comment-15172453
 ] 

Naganarasimha G R commented on YARN-4700:
-

may be small correction in the patch, as {{storeInFlowActivityTable}} is used 
in both app created and app finished i need to find the time from the first 
event of the entity ?

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4700-YARN-2928.wip.patch
>
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-29 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4700:

Attachment: YARN-4700-YARN-2928.wip.patch

[~sjlee0] & [~vrushalic], 
i meant to do the change as per the attached patch, hope it addresses all 
concerns.

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4700-YARN-2928.wip.patch
>
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4634) Scheduler UI/Metrics need to consider cases like non-queue label mappings

2016-02-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172403#comment-15172403
 ] 

Wangda Tan commented on YARN-4634:
--

Hi [~sunilg],

Thanks for updating, however, I will hesitate to add new state 
{{labelToQueueMappingAvailable}} to RMNodeLabelsManager and AbstractCSQueue 
needs to update that state as well.

I suggest to check show labels hierarchy only if:
- There's label other than DEFAULT_LABEL has >0 activeNMs.

And let's keep queues page as simple as possible.

You can check RMNodeLabelsManager#pullRMNodeLabelsInfo for details (And 
NodeLabelsPage as an example).

Thoughts?

> Scheduler UI/Metrics need to consider cases like non-queue label mappings
> -
>
> Key: YARN-4634
> URL: https://issues.apache.org/jira/browse/YARN-4634
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4634.patch, 0002-YARN-4634.patch
>
>
> Currently when label-queue mappings are not available, there are few 
> assumptions taken in UI and in metrics.
> In above case where labels are enabled and available in cluster but without 
> any queue mappings, UI displays queues under labels. This is not correct.
> Currently  labels enabled check and availability of labels are considered to 
> render scheduler UI. Henceforth we also need to check whether 
> - queue-mappings are available
> - nodes are mapped with labels with proper exclusivity flags on
> This ticket also will try to see the default configurations in queue when 
> labels are not mapped. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4704) TestResourceManager#testResourceAllocation() fails when using FairScheduler

2016-02-29 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu reassigned YARN-4704:
--

Assignee: Yufei Gu  (was: Ray Chiang)

> TestResourceManager#testResourceAllocation() fails when using FairScheduler
> ---
>
> Key: YARN-4704
> URL: https://issues.apache.org/jira/browse/YARN-4704
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: fairscheduler, test
>Affects Versions: 2.7.2
>Reporter: Ray Chiang
>Assignee: Yufei Gu
>
> When using FairScheduler, TestResourceManager#testResourceAllocation() fails 
> with the following error:
> java.lang.IllegalStateException: Trying to stop a non-running task: 1 of 
> application application_1455833410011_0001
> at 
> org.apache.hadoop.yarn.server.resourcemanager.Task.stop(Task.java:117)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.Application.finishTask(Application.java:266)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager.testResourceAllocation(TestResourceManager.java:167)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-02-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172390#comment-15172390
 ] 

Naganarasimha G R commented on YARN-4736:
-

Hi [~vrushalic],
Was checking with our hbase team from the logs and the trace they were 
informing that the issue might be due to bad connectivity with the zookeeper, 
but it strange to see that in the local node setup. So i suspect that there is 
some issue with my hbase setup, so planning to test with hbase-1.0.3 tar. 
But also would like to know whether you guys are facing the same issue or its 
only me who is facing it ?

> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Vrushali C
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: hbaseException.log, threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172364#comment-15172364
 ] 

Naganarasimha G R commented on YARN-4700:
-

Oops *TimelineStorageUtils.getTopOfTheDayTimestamp()* is not called we need to 
push that part of it in {{FlowActivityRowKey.getRowKey(clusterId, 
te.getCreatedTime(), userId, flowName)}}

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172357#comment-15172357
 ] 

Naganarasimha G R commented on YARN-4700:
-

Hi [~sjlee0],
Based on the points from [~vrushalic] and [~varun_saxena], was creating a patch 
such that {{HBaseTimelineWriterImpl.storeInFlowActivityTable}} uses 
{{FlowActivityRowKey.getRowKey(clusterId, te.getCreatedTime(), userId, 
flowName)}} instead of the other overloaded method which doesn't take the 
timestamp.
This would take care of of calling 
{{TimelineStorageUtils.getTopOfTheDayTimestamp()}} right ?

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4465) SchedulerUtils#validateRequest for Label check should happen only when nodelabel enabled

2016-02-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172342#comment-15172342
 ] 

Wangda Tan commented on YARN-4465:
--

Thanks [~bibinchundatt], +1 to latest patch. Will commit tomorrow if there's no 
objections.

> SchedulerUtils#validateRequest for Label check should happen only when 
> nodelabel enabled
> 
>
> Key: YARN-4465
> URL: https://issues.apache.org/jira/browse/YARN-4465
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4465.patch, 0002-YARN-4465.patch, 
> 0003-YARN-4465.patch, 0004-YARN-4465.patch, 0006-YARN-4465.patch, 
> 0007-YARN-4465.patch
>
>
> Disable label from rm side yarn.nodelabel.enable=false
> Capacity scheduler label configuration for queue is available as below
> default label for queue = b1 as 3 and accessible labels as 1,3
> Submit application to queue A .
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
>  Invalid resource request, queue=b1 doesn't have permission to access all 
> labels in resource request. labelExpression of resource request=3. Queue 
> labels=1,3
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:401)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:340)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:602)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:247)
> {noformat}
> # Ignore default label expression when label is disabled *or*
> # NormalizeResourceRequest we can set label expression to  
> when node label is not enabled *or*
> # Improve message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-29 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172335#comment-15172335
 ] 

Vrushali C commented on YARN-4700:
--

Hi [~sjlee0]
Yes, the flow activity table's row key always needs to  belong to the top of 
the day timestamp. But the event timestamp should be used to find out the top 
of that day.

bq.  If they meant that we would use the actual event timestamps as is for the 
row key, I'm not as sure.
No, we can't use the event timestamp as is. It needs to be top of the day of 
that timestamp.  Which is what I said in the previous comment, " the entry for 
that flow should go into THAT older day's row, hence we should use the event 
timestamp." 

You are right, the code in FlowActivityRowKey#getRowKey() needs to change to 
take the event timestamp, not the current time. I thought we were sending in 
null for the timestamp and hence using current time, but looks like it's 
directly using current time here. 

{code}
  long dayTs = TimelineStorageUtils.getTopOfTheDayTimestamp(System
.currentTimeMillis());
{code}



> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-29 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172332#comment-15172332
 ] 

Naganarasimha G R commented on YARN-4700:
-

It might be mostly the case of asynchronous or, its not necessary that ATS is 
running initially but starts up after RM failsover. 
But in any case would not better to link up with asynchronous and synchronous 
events for V2.

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4700) ATS storage has one extra record each time the RM got restarted

2016-02-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172304#comment-15172304
 ] 

Sangjin Lee commented on YARN-4700:
---

I may have misread the comments in haste last Friday. If the comments meant 
that we would use the event timestamps instead of the current time and 
calculate the top-of-the-day timestamps from them, then I concur. If they meant 
that we would use the actual event timestamps *as is* for the row key, I'm not 
as sure.

My main concern there is it might make some of the queries we want to do 
against this table in the future harder or make them perform more poorly. For 
example, we could do a query like "return all flow activities in the last 7 
days". With a top-of-the-day timestamps, it would be a simple partial row key 
matching. With variable timestamps, it would become more of a range query. Are 
my concerns overblown?

If the solution we're discussing is the former, then I think it's quite 
straightforward. We need a little bit of change in 
{{FlowActivityRowKey.getRowKey()}} where we should apply 
{{TimelineStorageUtils.getTopOfTheDayTimestamp()}} on the provided timestamp.

> ATS storage has one extra record each time the RM got restarted
> ---
>
> Key: YARN-4700
> URL: https://issues.apache.org/jira/browse/YARN-4700
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Li Lu
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> When testing the new web UI for ATS v2, I noticed that we're creating one 
> extra record for each finished application (but still hold in the RM state 
> store) each time the RM got restarted. It's quite possible that we add the 
> cluster start timestamp into the default cluster id, thus each time we're 
> creating a new record for one application (cluster id is a part of the row 
> key). We need to fix this behavior, probably by having a better default 
> cluster id. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-02-29 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172264#comment-15172264
 ] 

Arun Suresh commented on YARN-2883:
---

On going thru the patch again, I feel a better (and safer) way to introduce the 
changes is 
* to extend both the {{ContainerManagerImpl}} and the {{ContainersMonitorImpl}} 
classes into two new subclasses {{QueuedContainerManager}} and 
{{QueuedContainersManager}}
* Then we can move all the required data structures (all the new collections) 
as well as the Event Handlers into the new classes.
* We can also get rid of all the {{if 
context.isDistributedSchedulingEnabled()}} checks. Since we would need to do it 
only once in the {{NodeManager}} when we create the instance of the 
{{ContainerManager}}
* We can also reason about the code flow better since we the changes would be 
isolated to the new classes.

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-yarn-2877.001.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2883) Queuing of container requests in the NM

2016-02-29 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172226#comment-15172226
 ] 

Arun Suresh commented on YARN-2883:
---

Friday, February 26, 2016
5:01 PM

*ContainersMonitorImpl*
* In {{startPendingContainers()}}, It looks like the synchronized block itself 
can be refactored into another function, and then you can call the function 
first with {{queueGuarRequests}} and then with {{queueOpportRequests}}
* W.r.t TODO before {{updateNMTokenIdentifier}}.. Yup, it needs to be there for 
all containers* W.r.t TODO inside the CHANGE_MONITORING_CONTAINER_RESOURCE, 
yup, think we might have to update the available resource
* Looks like {{setQueuedContainerStatus}} method is not being used

Also, looks like a lot of tests have failed. Can you please verify if this is 
not due to changes in this patch.

> Queuing of container requests in the NM
> ---
>
> Key: YARN-2883
> URL: https://issues.apache.org/jira/browse/YARN-2883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2883-yarn-2877.001.patch
>
>
> We propose to add a queue in each NM, where queueable container requests can 
> be held.
> Based on the available resources in the node and the containers in the queue, 
> the NM will decide when to allow the execution of a queued container.
> In order to ensure the instantaneous start of a guaranteed-start container, 
> the NM may decide to pre-empt/kill running queueable containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4745) TestResourceLocalizationService.testPublicResourceInitializesLocalDir failing in trunk

2016-02-29 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172210#comment-15172210
 ] 

Daniel Templeton commented on YARN-4745:


In fact, I see it failing in branch-2.7 and branch-2.8 as well.

> TestResourceLocalizationService.testPublicResourceInitializesLocalDir failing 
> in trunk
> --
>
> Key: YARN-4745
> URL: https://issues.apache.org/jira/browse/YARN-4745
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>
> I am consistently seeing this:
> {noformat}
> ---
>  T E S T S
> ---
> Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.284 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testPublicResourceInitializesLocalDir(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 1.842 sec  <<< FAILURE!
> org.mockito.exceptions.verification.WantedButNotInvoked: 
> Wanted but not invoked:
> localFs.mkdir(
> 
> /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/filecache,
> rwxr-xr-x,
> true
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)
> However, there were other interactions with this mock:
> -> at org.apache.hadoop.fs.FileContext.(FileContext.java:249)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)
> Results :
> Failed tests: 
>   TestResourceLocalizationService.testPublicResourceInitializesLocalDir:1476 
> Wanted but not invoked:
> localFs.mkdir(
> 
> /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/filecache,
> rwxr-xr-x,
> true
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)
> However, there were other interactions with this mock:
> -> at org.apache.hadoop.fs.FileContext.(FileContext.java:249)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-02-29 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-4746:


 Summary: yarn web services should convert parse failures of appId 
to 400
 Key: YARN-4746
 URL: https://issues.apache.org/jira/browse/YARN-4746
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.8.0
Reporter: Steve Loughran
Priority: Minor


I'm seeing somewhere in the WS API tests of mine an error with exception 
conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
ATS, but a scan of the core RM web services implies a same problem


{{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} to 
convert an argument; this throws IllegalArgumentException, which is then 
handled somewhere by jetty as a 500 error.

In fact, it's a bad argument, which should be handled by returning a 400. This 
can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4731) container-executor should not follow symlinks in recursive_unlink_children

2016-02-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172101#comment-15172101
 ] 

Colin Patrick McCabe commented on YARN-4731:


Thanks for the reviews, guys.

> container-executor should not follow symlinks in recursive_unlink_children
> --
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Colin Patrick McCabe
>Priority: Blocker
> Fix For: 2.9.0
>
> Attachments: YARN-4731.001.patch, YARN-4731.002.patch
>
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 10 more
> {noformat}
> As a result nodemanager-local directory are not getting deleted for each 
> application
> {noformat}
> total 36
> drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./
> drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../
> -rw--- 1 hdfs hadoop  340 Feb 25 08:25 container_tokens
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.jar -> 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/11/job.jar/
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.xml -> 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/13/job.xml*
> drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 jobSubmitDir/
> -rwx-- 1 hdfs hadoop 5348 Feb 25 08:25 launch_container.sh*
> drwxr-s--- 2 hdfs hado

[jira] [Updated] (YARN-4745) TestResourceLocalizationService.testPublicResourceInitializesLocalDir failing in trunk

2016-02-29 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4745:
---
Issue Type: Sub-task  (was: Bug)
Parent: YARN-4478

> TestResourceLocalizationService.testPublicResourceInitializesLocalDir failing 
> in trunk
> --
>
> Key: YARN-4745
> URL: https://issues.apache.org/jira/browse/YARN-4745
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>
> I am consistently seeing this:
> {noformat}
> ---
>  T E S T S
> ---
> Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.284 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testPublicResourceInitializesLocalDir(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 1.842 sec  <<< FAILURE!
> org.mockito.exceptions.verification.WantedButNotInvoked: 
> Wanted but not invoked:
> localFs.mkdir(
> 
> /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/filecache,
> rwxr-xr-x,
> true
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)
> However, there were other interactions with this mock:
> -> at org.apache.hadoop.fs.FileContext.(FileContext.java:249)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)
> Results :
> Failed tests: 
>   TestResourceLocalizationService.testPublicResourceInitializesLocalDir:1476 
> Wanted but not invoked:
> localFs.mkdir(
> 
> /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/filecache,
> rwxr-xr-x,
> true
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)
> However, there were other interactions with this mock:
> -> at org.apache.hadoop.fs.FileContext.(FileContext.java:249)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4737) Use CSRF Filter in YARN

2016-02-29 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172067#comment-15172067
 ] 

Varun Vasudev commented on YARN-4737:
-

[~jmaron] - to my knowledge the only web UI that uses the web services call via 
javascript is the Tez UI. However there is a branch to change the RM UI to use 
javascript and web services as well.

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
> Attachments: YARN-4737.patch.001
>
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-4737) Use CSRF Filter in YARN

2016-02-29 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172067#comment-15172067
 ] 

Varun Vasudev edited comment on YARN-4737 at 2/29/16 4:11 PM:
--

[~jmaron] - to my knowledge the only web UI that uses the web services call via 
javascript is the Tez UI. There is a branch to change the RM UI to use 
javascript and web services as well. 

However, all of these should be using GET calls only so I suspect they won't be 
affected by this change.


was (Author: vvasudev):
[~jmaron] - to my knowledge the only web UI that uses the web services call via 
javascript is the Tez UI. However there is a branch to change the RM UI to use 
javascript and web services as well.

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
> Attachments: YARN-4737.patch.001
>
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-02-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172011#comment-15172011
 ] 

Varun Saxena commented on YARN-3863:


Checkstyle issues are related to imports made due to javadoc.

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-YARN-2928.v2.03.patch, 
> YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4731) container-executor should not follow symlinks in recursive_unlink_children

2016-02-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172016#comment-15172016
 ] 

Hudson commented on YARN-4731:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #9392 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9392/])
YARN-4731. container-executor should not follow symlinks in (jlowe: rev 
c58a6d53c58209a8f78ff64e04e9112933489fb5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


> container-executor should not follow symlinks in recursive_unlink_children
> --
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Colin Patrick McCabe
>Priority: Blocker
> Fix For: 2.9.0
>
> Attachments: YARN-4731.001.patch, YARN-4731.002.patch
>
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 10 more
> {noformat}
> As a result nodemanager-local directory are not getting deleted for each 
> application
> {noformat}
> total 36
> drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./
> drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../
> -rw--- 1 hdfs hadoop  340 Feb 25 08:25 contain

[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-02-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172005#comment-15172005
 ] 

Varun Saxena commented on YARN-3863:


Latest patch fixes the comments given by Sangjin.

# I have split TestHBaseTimelineStorage into TestHBaseTimelineStorageApps and 
TestHBaseTimelineStorageEntities. Also moved the part related to loading apps 
and entities to a separate class.
# Methods to match filters in TimelineStorageUtils have been refactored for 
better readability and to club common code together. Have passed on enum to 
decide which filter we are trying to match. Suggestions for better name of this 
enum ?
# For separation of logic between Generic and Application entity reader, I have 
refactored methods which were exclusively in GenericEntityReader but were being 
used by ApplicationEntityReader. Now relevant logic will exist both the 
classes(with EntityColumnPrefix used in GenericEntityReader and 
ApplicationColumnPrefix used in ApplicationEntityReader). Have tried to move 
some of the common code used in these methods to utils classes.
# I have also moved previously written methods in GenericEntityReader to read 
relations, events,etc. to TimelineStorageUtils. These methods were being used 
by ApplicationEntityReader as well in addition to GenericEntityReader.
# Refactored createSingleColValueFiltersByRange() and  
createHBaseSingleColValueFilter() so that createSingleColValueFiltersByRange 
can call createHBaseSingleColValueFilter().
# Fixed javadoc related comments and made members final in the classes pointed 
out.
# Removed preconditions check for filters not being null. Now if filters are 
null, I create TimelineEntityFilters object with default values in 
augmentParams.
# Used == instead of equals to match enums.
# Changed name of TimelineEqualityFilter and TimelineMultiValEqualityFilter to 
TimelineKeyValueFilter and TimelineKeyValuesFilter respectively.
# There was a comment on why getCompoundColQualBytes is being used. I had 
missed using it in previous patch. It is to be used for events. If say event to 
be fetched is UPDATE_APP with associated info as info1 as key, ts as timestamp 
and val1 as value, then the column is of the form {{e!UPDATE_APP=ts=info1}}. 
This kind of column within code is referred to as a compound column. The part 
after the prefix {{e!}} i.e part with {{=}} as separator is what we want to 
construct with getCompoundColQualBytes.
So, if we have to match event filters, the qualifier filter will have to be 
matched with prefix {{e!UPDATE_APP=}}. The getCompoundColQualBytes bit for 
event filters comes in here.
# For the comments for TimelineReaderWebServicesUtils.java, kindy refer to the 
explanation in the comment in which I have detailed what I have done. The 
changes here are required.

I could not quite get below comment. I did not make any change on line 448. 
Sangjin, can you elaborate. Maybe you meant some other line.
{quote}
(HBaseTimelineWriterImpl.java)
l.448: it should simply be a else if
{quote}

cc [~sjlee0], [~djp]
As the patch is quite big, refer to comment 
https://issues.apache.org/jira/browse/YARN-3863?focusedCommentId=15169833&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15169833
 to get details of what has been implemented.

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-YARN-2928.v2.03.patch, 
> YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4737) Use CSRF Filter in YARN

2016-02-29 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron updated YARN-4737:
-
Attachment: YARN-4737.patch.001

The key elements of the uploaded patch:

- Provides a CSRF enabling call to WebApps.Builder, taking the configuration 
prefix as an argument.
- Adds the call to web apps currently capable of an SPNEGO authentication (and 
thus susceptible to CSRF) - RM, NM, and Job History
- Defines the properties associated with configuration of the filter for these 
given web apps
- Tests added based on TestRMWebServices (used the test as an example of client 
invocations of RM web endpoint)

NOTE:  Could use some assistance in ascertaining whether web apps currently 
have javascript invocations of the exposed REST services.  Those calls will 
fail if CSRF is enabled.

> Use CSRF Filter in YARN
> ---
>
> Key: YARN-4737
> URL: https://issues.apache.org/jira/browse/YARN-4737
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, webapp
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
> Attachments: YARN-4737.patch.001
>
>
> A CSRF filter was added to hadoop common 
> (https://issues.apache.org/jira/browse/HADOOP-12691).  The aim of this JIRA 
> is to come up with a mechanism to integrate this filter into the webapps for 
> which it is applicable (web apps that may establish an authenticated 
> identity).  That includes the RM, NM, and mapreduce jobhistory web app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4731) container-executor should not follow symlinks in recursive_unlink_children

2016-02-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171975#comment-15171975
 ] 

Jason Lowe commented on YARN-4731:
--

Thanks for catching the vulnerability, Colin!

+1 lgtm.  Committing this.


> container-executor should not follow symlinks in recursive_unlink_children
> --
>
> Key: YARN-4731
> URL: https://issues.apache.org/jira/browse/YARN-4731
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>Assignee: Colin Patrick McCabe
>Priority: Blocker
> Attachments: YARN-4731.001.patch, YARN-4731.002.patch
>
>
> Enable LCE and CGroups
> Submit a mapreduce job
> {noformat}
> 2016-02-24 18:56:46,889 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Deleting 
> absolute path : 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
> 2016-02-24 18:56:46,894 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 255. Privileged Execution Operation 
> Output:
> main : command provided 3
> main : run as user is dsperf
> main : requested yarn user is dsperf
> failed to rmdir job.jar: Not a directory
> Error while deleting 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01:
>  20 (Not a directory)
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  dsperf, dsperf, 3, 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01]
> 2016-02-24 18:56:46,894 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: 
> DeleteAsUser for 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/dsperf/appcache/application_1456319010019_0003/container_e02_1456319010019_0003_01_01
>  returned with exit code: 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.deleteAsUser(LinuxContainerExecutor.java:569)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$FileDeletionTask.run(DeletionService.java:265)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 10 more
> {noformat}
> As a result nodemanager-local directory are not getting deleted for each 
> application
> {noformat}
> total 36
> drwxr-s--- 4 hdfs hadoop 4096 Feb 25 08:25 ./
> drwxr-s--- 7 hdfs hadoop 4096 Feb 25 08:25 ../
> -rw--- 1 hdfs hadoop  340 Feb 25 08:25 container_tokens
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.jar -> 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/11/job.jar/
> lrwxrwxrwx 1 hdfs hadoop  111 Feb 25 08:25 job.xml -> 
> /opt/bibin/dsperf/HAINSTALL/nmlocal/usercache/hdfs/appcache/application_1456364845478_0004/filecache/13/job.xml*
> drwxr-s--- 2 hdfs hadoop 4096 Feb 25 08:25 jobSubmitDir/
> -rwx-- 1 hdfs hadoop 5348 Feb 25 08:25 launch_container.sh*
> drwxr-s--- 2 hdfs hadoop 409

[jira] [Commented] (YARN-4745) TestResourceLocalizationService.testPublicResourceInitializesLocalDir failing in trunk

2016-02-29 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171951#comment-15171951
 ] 

Daniel Templeton commented on YARN-4745:


Also failing in branch-2.

> TestResourceLocalizationService.testPublicResourceInitializesLocalDir failing 
> in trunk
> --
>
> Key: YARN-4745
> URL: https://issues.apache.org/jira/browse/YARN-4745
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>
> I am consistently seeing this:
> {noformat}
> ---
>  T E S T S
> ---
> Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.284 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testPublicResourceInitializesLocalDir(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 1.842 sec  <<< FAILURE!
> org.mockito.exceptions.verification.WantedButNotInvoked: 
> Wanted but not invoked:
> localFs.mkdir(
> 
> /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/filecache,
> rwxr-xr-x,
> true
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)
> However, there were other interactions with this mock:
> -> at org.apache.hadoop.fs.FileContext.(FileContext.java:249)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)
> Results :
> Failed tests: 
>   TestResourceLocalizationService.testPublicResourceInitializesLocalDir:1476 
> Wanted but not invoked:
> localFs.mkdir(
> 
> /Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/filecache,
> rwxr-xr-x,
> true
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)
> However, there were other interactions with this mock:
> -> at org.apache.hadoop.fs.FileContext.(FileContext.java:249)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> -> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4745) TestResourceLocalizationService.testPublicResourceInitializesLocalDir failing in trunk

2016-02-29 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-4745:
--

 Summary: 
TestResourceLocalizationService.testPublicResourceInitializesLocalDir failing 
in trunk
 Key: YARN-4745
 URL: https://issues.apache.org/jira/browse/YARN-4745
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.9.0
Reporter: Daniel Templeton


I am consistently seeing this:

{noformat}
---
 T E S T S
---
Running 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.284 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
testPublicResourceInitializesLocalDir(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
  Time elapsed: 1.842 sec  <<< FAILURE!
org.mockito.exceptions.verification.WantedButNotInvoked: 
Wanted but not invoked:
localFs.mkdir(

/Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/filecache,
rwxr-xr-x,
true
);
-> at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)

However, there were other interactions with this mock:
-> at org.apache.hadoop.fs.FileContext.(FileContext.java:249)
-> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
-> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
-> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
-> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
-> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)

at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)


Results :

Failed tests: 
  TestResourceLocalizationService.testPublicResourceInitializesLocalDir:1476 
Wanted but not invoked:
localFs.mkdir(

/Users/daniel/NetBeansProjects/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/filecache,
rwxr-xr-x,
true
);
-> at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testPublicResourceInitializesLocalDir(TestResourceLocalizationService.java:1476)

However, there were other interactions with this mock:
-> at org.apache.hadoop.fs.FileContext.(FileContext.java:249)
-> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
-> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
-> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
-> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)
-> at org.apache.hadoop.fs.FileContext.makeQualified(FileContext.java:611)


Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3863) Support complex filters in TimelineReader

2016-02-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3863:
---
Attachment: (was: YARN-3863-YARN-2928.v2.03.patch)

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-YARN-2928.v2.03.patch, 
> YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Support complex filters in TimelineReader

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171910#comment-15171910
 ] 

Hadoop QA commented on YARN-3863:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
52s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
35s {color} | {color:green} YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} YARN-2928 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice:
 patch generated 5 new + 2 unchanged - 1 fixed = 7 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 53s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 57s 
{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed 
with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 8s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790480/YARN-3863-YARN-2928.v2.03.patch
 |
| JIRA Issue | YARN-3863 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 8341224cdf9f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| B

[jira] [Commented] (YARN-4715) Add support to read resource types from a config file

2016-02-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171907#comment-15171907
 ] 

Hadoop QA commented on YARN-4715:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 9 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
31s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 7s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
25s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
32s {color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} YARN-3926 passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 17s 
{color} | {color:green} YARN-3926 passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 3 new + 
228 unchanged - 3 fixed = 231 total (was 231) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 4s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 13s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflice

[jira] [Updated] (YARN-3863) Support complex filters in TimelineReader

2016-02-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3863:
---
Attachment: YARN-3863-YARN-2928.v2.03.patch

> Support complex filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-YARN-2928.v2.01.patch, 
> YARN-3863-YARN-2928.v2.02.patch, YARN-3863-YARN-2928.v2.03.patch, 
> YARN-3863-YARN-2928.v2.03.patch, YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4715) Add support to read resource types from a config file

2016-02-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4715:

Attachment: YARN-4715-YARN-3926.004.patch

After some further thought, I think it's just easier to dis-allow specifying 
memory and vcores as resource types. Uploaded a new patch with the fix

> Add support to read resource types from a config file
> -
>
> Key: YARN-4715
> URL: https://issues.apache.org/jira/browse/YARN-4715
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4715-YARN-3926.001.patch, 
> YARN-4715-YARN-3926.002.patch, YARN-4715-YARN-3926.003.patch, 
> YARN-4715-YARN-3926.004.patch
>
>
> This ticket is to add support to allow the RM to read the resource types to 
> be used for scheduling from a config file. I'll file follow up tickets to add 
> similar support in the NM as well as to handle the RM-NM handshake protocol 
> issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4566) TestMiniYarnClusterNodeUtilization sometimes fails on trunk

2016-02-29 Thread Takashi Ohnishi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171828#comment-15171828
 ] 

Takashi Ohnishi commented on YARN-4566:
---

Thank you [~rohithsharma] for reviewing and committing !!

> TestMiniYarnClusterNodeUtilization sometimes fails on trunk
> ---
>
> Key: YARN-4566
> URL: https://issues.apache.org/jira/browse/YARN-4566
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Reporter: Takashi Ohnishi
>Assignee: Takashi Ohnishi
> Fix For: 2.9.0
>
> Attachments: YARN-4566.1.patch
>
>
> TestMiniYarnClusterNodeUtilization often fails with NPE.
> {code}
> testUpdateNodeUtilization(org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization)
>   Time elapsed: 3.752 sec  <<< ERROR!
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.verifySimulatedUtilization(TestMiniYarnClusterNodeUtilization.java:217)
>   at 
> org.apache.hadoop.yarn.server.TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization(TestMiniYarnClusterNodeUtilization.java:116)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4744) Too many signal to container failure in case of LCE

2016-02-29 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4744:
---
Affects Version/s: 2.9.0

> Too many signal to container failure in case of LCE
> ---
>
> Key: YARN-4744
> URL: https://issues.apache.org/jira/browse/YARN-4744
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Bibin A Chundatt
>
> Enable LCE with cgroups
> Start server with dsperf user
> Submit application with user yarn
> Too many signal to container failure 
> {noformat}
> 2014-03-01 14:10:32,223 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime:
>  Using container runtime: DefaultLinuxContainerRuntime
> 2014-03-01 14:10:32,228 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 9. Privileged Execution Operation Output:
> main : command provided 2
> main : run as user is yarn
> main : requested yarn user is yarn
> Full command array for failed execution:
> [/opt/bibin/dsperf/HAINSTALL/install/hadoop/nodemanager/bin/container-executor,
>  yarn, yarn, 2, 28575, 15]
> 2014-03-01 14:10:32,228 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime:
>  Signal container failed. Exception:
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=9:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:173)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:132)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:109)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:513)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:520)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: ExitCodeException exitCode=9:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:927)
> at org.apache.hadoop.util.Shell.run(Shell.java:838)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1117)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:150)
> ... 9 more
> {noformat}
> Checked the same scenario in 2.7.2 version (not available)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >