[jira] [Commented] (YARN-2981) DockerContainerExecutor must support a Cluster-wide default Docker image

2015-02-26 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339744#comment-14339744
 ] 

Abin Shahab commented on YARN-2981:
---

[~raviprak] [~vinodkv] [~vvasudev] [~ywskycn] please review

 DockerContainerExecutor must support a Cluster-wide default Docker image
 

 Key: YARN-2981
 URL: https://issues.apache.org/jira/browse/YARN-2981
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Abin Shahab
Assignee: Abin Shahab
 Attachments: YARN-2981.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3262) Surface application outstanding resource requests table

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339753#comment-14339753
 ] 

Hadoop QA commented on YARN-3262:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701265/YARN-3262.4.patch
  against trunk revision 8ca0d95.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6773//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6773//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6773//console

This message is automatically generated.

 Surface application outstanding resource requests table
 ---

 Key: YARN-3262
 URL: https://issues.apache.org/jira/browse/YARN-3262
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-3262.1.patch, YARN-3262.2.patch, YARN-3262.3.patch, 
 YARN-3262.4.patch, resource requests.png


 It would be useful to surface the outstanding resource requests table on the 
 application web page to facilitate  scheduling analysis and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3168) Convert site documentation from apt to markdown

2015-02-26 Thread Gururaj Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339780#comment-14339780
 ] 

Gururaj Shetty commented on YARN-3168:
--

Hi [~aw]

All your comments are incorporated. Kindly review the latest patch attached.

 Convert site documentation from apt to markdown
 ---

 Key: YARN-3168
 URL: https://issues.apache.org/jira/browse/YARN-3168
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, 
 YARN-3168.20150225.2.patch, YARN-3168.20150227.3.patch


 YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2015-02-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339796#comment-14339796
 ] 

Varun Saxena commented on YARN-2962:


[~kasha] / [~ka...@cloudera.com], for this I WILL assume that state store will 
be formatted before making the config change ?
Backward compatibility for running apps after config change (on RM restart) 
will be difficult. As we may have to try all the possible appid formats.


 ZKRMStateStore: Limit the number of znodes under a znode
 

 Key: YARN-2962
 URL: https://issues.apache.org/jira/browse/YARN-2962
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Varun Saxena
Priority: Critical

 We ran into this issue where we were hitting the default ZK server message 
 size configs, primarily because the message had too many znodes even though 
 they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3204) Fix new findbug warnings in hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)

2015-02-26 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339811#comment-14339811
 ] 

Chengbing Liu commented on YARN-3204:
-

{code}
-this.reservedAppSchedulable = (FSAppAttempt) application;
+ if(application instanceof FSAppAttempt){
+   this.reservedAppSchedulable = (FSAppAttempt) application;
+}
{code}
Would it be better if we throw an exception if the condition is not met?

{code}
 SetString planQueues = new HashSetString();
 for (FSQueue fsQueue : queueMgr.getQueues()) {
   String queueName = fsQueue.getName();
-  if (allocConf.isReservable(queueName)) {
+  boolean isReservable = false;
+  synchronized(this){
+ isReservable = allocConf.isReservable(queueName);
+  }
+  if (isReservable) {
 planQueues.add(queueName);
   }
 }
{code}
I think we should synchronize the whole function, since {{allocConf}} may be 
reloaded during this loop. A dedicated lock is better than 
{{FairScheduler.this}} to me.

 Fix new findbug warnings in 
 hadoop-yarn-server-resourcemanager(resourcemanager.scheduler.fair)
 --

 Key: YARN-3204
 URL: https://issues.apache.org/jira/browse/YARN-3204
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Brahma Reddy Battula
 Attachments: YARN-3204-001.patch, YARN-3204-002.patch


 Please check following findbug report..
 https://builds.apache.org/job/PreCommit-YARN-Build/6644//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339774#comment-14339774
 ] 

Hadoop QA commented on YARN-2820:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701267/YARN-2820.006.patch
  against trunk revision 8ca0d95.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6775//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6775//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6775//console

This message is automatically generated.

 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 --

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0, 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
 YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
 YARN-2820.005.patch, YARN-2820.006.patch


 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 

[jira] [Updated] (YARN-3168) Convert site documentation from apt to markdown

2015-02-26 Thread Gururaj Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gururaj Shetty updated YARN-3168:
-
Attachment: YARN-3168.20150227.3.patch

 Convert site documentation from apt to markdown
 ---

 Key: YARN-3168
 URL: https://issues.apache.org/jira/browse/YARN-3168
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Allen Wittenauer
Assignee: Gururaj Shetty
 Attachments: YARN-3168-00.patch, YARN-3168.20150224.1.patch, 
 YARN-3168.20150225.2.patch, YARN-3168.20150227.3.patch


 YARN analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3122:
---
Attachment: YARN-3122.005.patch

The updated patch looks mostly good to me. I like that we are mimicking top; 
users will it easier to reason about this.

I had a few nit picks that I have put into v5 patch - rename 
CpuTimeTracker#getCpuUsagePercent and changes to comments. [~adhoot] - can you 
please review and verify the changes.

One last concern - we use 0 for when we cannot calculate the percentage. 
Shouldn't we use UNAVAILABLE instead? 

 Metrics for container's actual CPU usage
 

 Key: YARN-3122
 URL: https://issues.apache.org/jira/browse/YARN-3122
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
 YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
 YARN-3122.prelim.patch, YARN-3122.prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track CPU usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339830#comment-14339830
 ] 

Hadoop QA commented on YARN-3122:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701280/YARN-3122.005.patch
  against trunk revision 8ca0d95.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6776//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6776//console

This message is automatically generated.

 Metrics for container's actual CPU usage
 

 Key: YARN-3122
 URL: https://issues.apache.org/jira/browse/YARN-3122
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
 YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.005.patch, 
 YARN-3122.prelim.patch, YARN-3122.prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track CPU usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3222) RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential order

2015-02-26 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339799#comment-14339799
 ] 

Rohith commented on YARN-3222:
--

bq. NODE_USABLE event is sent regardless the reconnected node is healthy or not 
healthy, which is incorrect, right ?
Yes, I think it was assumed like if new node is reconnecting then NM is 
healthy. It is better to retain the old state i.e UNHEALTHY and in the next 1st 
heartbeat NodeStatus can be moved from Unhealthy to Running.

I see another potential issue that if old node is retaining then RMnode has to 
be updated {{totalCapability}} with new RMNode resource.  But in flow, 
{{totalCapability}} is not updated. This result , scheduler has updated 
resources value but RMNode has stale memory. Any client getting RMnode 
capabilit from RMnode would end up in wrong node resource value.
{code}
if (noRunningApps) {
// some code
rmNode.context.getDispatcher().getEventHandler().handle(
new NodeRemovedSchedulerEvent(rmNode));

if (rmNode.getHttpPort() == newNode.getHttpPort()) {
   if (rmNode.getState() != NodeState.UNHEALTHY) {
// Only add new node if old state is not UNHEALTHY
rmNode.context.getDispatcher().getEventHandler().handle(
new NodeAddedSchedulerEvent(newNode));  // NEW NODE CAPABILITY 
SHOULD BE UPDATED TO OLD NODE
  }
} else {
  // Reconnected node differs, so replace old node and start new node
rmNode.context.getDispatcher().getEventHandler().handle(
new RMNodeStartedEvent(newNode.getNodeID(), null, null)); // No 
need to update totalCapability since old node is replaced with new node.
}
  }
{code}

 RMNodeImpl#ReconnectNodeTransition should send scheduler events in sequential 
 order
 ---

 Key: YARN-3222
 URL: https://issues.apache.org/jira/browse/YARN-3222
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-3222.patch


 When a node is reconnected,RMNodeImpl#ReconnectNodeTransition notifies the 
 scheduler in a events node_added,node_removed or node_resource_update. These 
 events should be notified in an sequential order i.e node_added event and 
 next node_resource_update events.
 But if the node is reconnected with different http port, the oder of 
 scheduler events are node_removed -- node_resource_update -- node_added 
 which causes scheduler does not find the node and throw NPE and RM exit.
 Node_Resource_update event should be always should be triggered via 
 RMNodeEventType.RESOURCE_UPDATE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3122:

Attachment: YARN-3122.004.patch

Modified CPU usage to be percent per core and the corresponding metric also to 
be percent per core. Thus 2 cores used up should report as 200%
Added doc comments

 Metrics for container's actual CPU usage
 

 Key: YARN-3122
 URL: https://issues.apache.org/jira/browse/YARN-3122
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
 YARN-3122.003.patch, YARN-3122.004.patch, YARN-3122.prelim.patch, 
 YARN-3122.prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track CPU usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3189) Yarn application usage command should not give -appstate and -apptype

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338380#comment-14338380
 ] 

Hadoop QA commented on YARN-3189:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701054/YARN-3189.patch
  against trunk revision 0d4296f.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6756//console

This message is automatically generated.

 Yarn application usage command should not give -appstate and -apptype
 -

 Key: YARN-3189
 URL: https://issues.apache.org/jira/browse/YARN-3189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Anushri
Assignee: Anushri
Priority: Minor
 Attachments: YARN-3189.patch


 Yarn application usage command should not give -appstate and -apptype since 
 these two are applicable to --list command..
  *Can somebody please assign this issue to me* 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338686#comment-14338686
 ] 

Junping Du commented on YARN-3087:
--

Thanks [~gtCarrera9] for updating the patch!
Just quickly go though the patch, many changes are replacing Map with HashMap 
in define, like following:
{code}
-  private MapString, SetString isRelatedToEntities = new HashMap();
-  private MapString, SetString relatesToEntities = new HashMap();
+  private HashMapString, SetString isRelatedToEntities = new HashMap();
+  private HashMapString, SetString relatesToEntities = new HashMap();
{code}
Any specific reason for doing this? Typically, we are define things (objects, 
interfaces) with more generic type (according to Liskov Substitution principle).

 [Aggregator implementation] the REST server (web server) for per-node 
 aggregator does not work if it runs inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch


 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3131) YarnClientImpl should check FAILED and KILLED state in submitApplication

2015-02-26 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338665#comment-14338665
 ] 

Chang Li commented on YARN-3131:


[~jlowe] [~jianhe] [~leftnoteasy] Could any of you help commit this if it looks 
good for you now? Thanks

 YarnClientImpl should check FAILED and KILLED state in submitApplication
 

 Key: YARN-3131
 URL: https://issues.apache.org/jira/browse/YARN-3131
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: yarn_3131_v1.patch, yarn_3131_v2.patch, 
 yarn_3131_v3.patch, yarn_3131_v4.patch, yarn_3131_v5.patch, 
 yarn_3131_v6.patch, yarn_3131_v7.patch


 Just run into a issue when submit a job into a non-existent queue and 
 YarnClient raise no exception. Though that job indeed get submitted 
 successfully and just failed immediately after, it will be better if 
 YarnClient can handle the immediate fail situation like YarnRunner does



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-02-26 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338687#comment-14338687
 ] 

Naganarasimha G R commented on YARN-3039:
-

Hi [~djp]
bq. Another idea (from Vinod in offline discussion) is to add a blocking call 
in AMRMClient to get aggregator address directly from RM
+1 for this approach. Also if NM uses this new blocking call in AMRMClient to 
get aggregator address then there might not be any race conditions for posting 
AM container's life cycle events by NM immediately after creation of 
appAggregator through Aux service.

bq. In addition, if adding a new API in AMRMClient can be accepted, NM will use 
TimelineClient too so can handle service discovery automatically.
Are we just adding a method to get the  aggregator address aggregator address ? 
or what other API's are planned ?

bq. NM will notify RM that this new appAggregator is ready for use in next 
heartbeat to RM (missing in this patch).
bq.  RM verify the out of service for this app aggregator first and kick off 
rebind appAggregator to another NM's perNodeAggregatorService in next heartbeat 
comes.
I beleive the idea of using AUX service was to to decouple NM and Timeline 
service. If NM will notify RM about new appAggregator creation (based on AUX 
service) then basically NM should be aware of PerNodeAggregatorServer is 
configured as AUX service, and and if it supports rebinding appAggregator for 
failure then it should be able to communicate with  this Auxservice too, 
whether would this be clean approach?

I also feel we need to support  to start per app aggregator only if app 
requests for it (Zhijie also had mentioned abt this). If not we can make use of 
one default aggregator for all these kind of apps launched in NM, which is just 
used to post container entities from different NM's for these apps.

Any discussions happened wrt RM having its own Aggregator ? I feel it would be 
better for RM to have it as it need not depend on any NM's to post any entities

 [Aggregator wireup] Implement ATS app-appgregator service discovery
 ---

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, YARN-3039-no-test.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-02-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338654#comment-14338654
 ] 

Junping Du commented on YARN-3031:
--

bq. The AggregateUpTo enum has the tracks to aggregate along, the 
TimelineEntityType enum has the types of entities that can exist. There may not 
be aggregations along all entity types. 
I see. Thanks [~vrushalic] for explanation here. However, CLUSTER seems to be 
missing here as it is an aggregation of FLOWs. Isn't it?

bq. The reasoning behind having two more apis for writing metrics and events in 
addition to the entity write is that, it would be good (efficient) to have the 
option to write a single metric or a single event. For example, say a job has 
many custom metrics and one particular metric is updated extremely frequently 
but not the others. We may want to write out only that particular metric 
without having to look through/write all other metrics and other information in 
that entity. Similarly for events. Perhaps we could do it differently that what 
is proposed in the patch, but the functionality of writing them individually 
would help in performance I believe.
Agree that we should have separated interfaces to write single data entry 
quickly and aggregate data entries. Also some aggregator (like RM) won't even 
call aggregation interface here (according to YARN-3167). IMO, it sounds like 
two interfaces are good enough so we can merge addEvent() and updateMetrics() 
into a single data entry writer which can accept more generic type? That will 
make interface more concisely and hiding more details that could be changed in 
future. Thoughts?

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
 YARN-3031.02.patch, YARN-3031.03.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy

2015-02-26 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338526#comment-14338526
 ] 

Tsuyoshi Ozawa commented on YARN-3217:
--

+1

 Remove httpclient dependency from hadoop-yarn-server-web-proxy
 --

 Key: YARN-3217
 URL: https://issues.apache.org/jira/browse/YARN-3217
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Fix For: 2.7.0

 Attachments: YARN-3217-002.patch, YARN-3217-003.patch, 
 YARN-3217-003.patch, YARN-3217-004.patch, YARN-3217.patch


 Sub-task of HADOOP-10105. Remove httpclient dependency from 
 WebAppProxyServlet.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy

2015-02-26 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3217:
-
 Target Version/s: 2.7.0
Affects Version/s: 2.6.0
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed

 Remove httpclient dependency from hadoop-yarn-server-web-proxy
 --

 Key: YARN-3217
 URL: https://issues.apache.org/jira/browse/YARN-3217
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Fix For: 2.7.0

 Attachments: YARN-3217-002.patch, YARN-3217-003.patch, 
 YARN-3217-003.patch, YARN-3217-004.patch, YARN-3217.patch


 Sub-task of HADOOP-10105. Remove httpclient dependency from 
 WebAppProxyServlet.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338556#comment-14338556
 ] 

Hudson commented on YARN-3256:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2066 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2066/])
YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against 
(devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java


 TestClientToAMTokens#testClientTokenRace is not running against all 
 Schedulers even when using ParameterizedSchedulerTestBase
 -

 Key: YARN-3256
 URL: https://issues.apache.org/jira/browse/YARN-3256
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3256.001.patch


 The test testClientTokenRace was not using the base class conf causing it to 
 run twice on the same Scheduler configured in the default.
 All tests deriving from ParameterizedSchedulerTestBase should use the conf 
 created in the base class instead of newing up inside the test and hiding the 
 member. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options

2015-02-26 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3255:
-
Hadoop Flags: Reviewed

 RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support 
 generic options
 ---

 Key: YARN-3255
 URL: https://issues.apache.org/jira/browse/YARN-3255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: YARN-3255-01.patch, YARN-3255-02.patch


 Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
 generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
 ability to pass generic options in order to specify configuration files or 
 the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options

2015-02-26 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3255:
-
Summary: RM, NM, JobHistoryServer, and WebAppProxyServer's main() should 
support generic options  (was: RM and NM main() should support generic options)

 RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support 
 generic options
 ---

 Key: YARN-3255
 URL: https://issues.apache.org/jira/browse/YARN-3255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: YARN-3255-01.patch, YARN-3255-02.patch


 Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
 generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
 ability to pass generic options in order to specify configuration files or 
 the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338517#comment-14338517
 ] 

Hudson commented on YARN-3256:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #116 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/116/])
YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against 
(devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java
* hadoop-yarn-project/CHANGES.txt


 TestClientToAMTokens#testClientTokenRace is not running against all 
 Schedulers even when using ParameterizedSchedulerTestBase
 -

 Key: YARN-3256
 URL: https://issues.apache.org/jira/browse/YARN-3256
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3256.001.patch


 The test testClientTokenRace was not using the base class conf causing it to 
 run twice on the same Scheduler configured in the default.
 All tests deriving from ParameterizedSchedulerTestBase should use the conf 
 created in the base class instead of newing up inside the test and hiding the 
 member. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338565#comment-14338565
 ] 

Hudson commented on YARN-3217:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7208 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7208/])
YARN-3217. Remove httpclient dependency from hadoop-yarn-server-web-proxy. 
Contributed by Brahma Reddy Battula. (ozawa: rev 
773b6515ac51af3484824bd6f57685a9726a1e70)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/pom.xml


 Remove httpclient dependency from hadoop-yarn-server-web-proxy
 --

 Key: YARN-3217
 URL: https://issues.apache.org/jira/browse/YARN-3217
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Fix For: 2.7.0

 Attachments: YARN-3217-002.patch, YARN-3217-003.patch, 
 YARN-3217-003.patch, YARN-3217-004.patch, YARN-3217.patch


 Sub-task of HADOOP-10105. Remove httpclient dependency from 
 WebAppProxyServlet.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338548#comment-14338548
 ] 

Hudson commented on YARN-3239:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2066 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2066/])
YARN-3239. WebAppProxy does not support a final tracking url which has query 
fragments and params. Contributed by Jian He (jlowe: rev 
1a68fc43464d3948418f453bb2f80df7ce773097)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java
* hadoop-yarn-project/CHANGES.txt


 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Fix For: 2.7.0

 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3264) [Storage implementation] Create a POC only file based storage implementation for ATS writes

2015-02-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338584#comment-14338584
 ] 

Junping Du commented on YARN-3264:
--

+1. This is also useful for test.

 [Storage implementation] Create a POC only file based storage implementation 
 for ATS writes
 ---

 Key: YARN-3264
 URL: https://issues.apache.org/jira/browse/YARN-3264
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Vrushali C
Assignee: Vrushali C

 For the PoC, need to create a backend impl for file based storage of entities 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3217) Remove httpclient dependency from hadoop-yarn-server-web-proxy

2015-02-26 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338591#comment-14338591
 ] 

Brahma Reddy Battula commented on YARN-3217:


Thanks a lot [~ozawa] !!!

 Remove httpclient dependency from hadoop-yarn-server-web-proxy
 --

 Key: YARN-3217
 URL: https://issues.apache.org/jira/browse/YARN-3217
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Fix For: 2.7.0

 Attachments: YARN-3217-002.patch, YARN-3217-003.patch, 
 YARN-3217-003.patch, YARN-3217-004.patch, YARN-3217.patch


 Sub-task of HADOOP-10105. Remove httpclient dependency from 
 WebAppProxyServlet.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-26 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338589#comment-14338589
 ] 

Tsuyoshi Ozawa commented on YARN-2820:
--

[~zxu] thanks for your updating! The implementation of FSAction looks good to 
me. I found following points to be fixed:

1. In startInternal, fs.mkdirs can be replaced with mkdirsWithRetries:
{code}
fs.mkdirs(rmDTSecretManagerRoot);
fs.mkdirs(rmAppRoot);
fs.mkdirs(amrmTokenSecretManagerRoot);
{code}

2. All readFile() should be replaced with readFileWithRetries like 
writeFileWithRetries. 
3. fs.listStatus() should be replaced with listStatusWithRetries.

4. We can use try-with-resources in storeRMDTMasterKeyState to close fsOut. I 
know it's not related to this patch, but it's better to be fixed here.
{code}
DataOutputStream fsOut = new DataOutputStream(os);
{code}

Do you mind updating a patch again?

 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 --

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0, 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
 YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
 YARN-2820.005.patch


 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  
 at 
 

[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338509#comment-14338509
 ] 

Hudson commented on YARN-3239:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #116 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/116/])
YARN-3239. WebAppProxy does not support a final tracking url which has query 
fragments and params. Contributed by Jian He (jlowe: rev 
1a68fc43464d3948418f453bb2f80df7ce773097)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java


 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Fix For: 2.7.0

 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3189) Yarn application usage command should not give -appstate and -apptype

2015-02-26 Thread Anushri (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anushri updated YARN-3189:
--
Attachment: YARN-3189.patch

 Yarn application usage command should not give -appstate and -apptype
 -

 Key: YARN-3189
 URL: https://issues.apache.org/jira/browse/YARN-3189
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Anushri
Assignee: Anushri
Priority: Minor
 Attachments: YARN-3189.patch


 Yarn application usage command should not give -appstate and -apptype since 
 these two are applicable to --list command..
  *Can somebody please assign this issue to me* 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338445#comment-14338445
 ] 

Hudson commented on YARN-3239:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #107 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/107/])
YARN-3239. WebAppProxy does not support a final tracking url which has query 
fragments and params. Contributed by Jian He (jlowe: rev 
1a68fc43464d3948418f453bb2f80df7ce773097)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java


 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Fix For: 2.7.0

 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338436#comment-14338436
 ] 

Hudson commented on YARN-3256:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2048 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2048/])
YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against 
(devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java
* hadoop-yarn-project/CHANGES.txt


 TestClientToAMTokens#testClientTokenRace is not running against all 
 Schedulers even when using ParameterizedSchedulerTestBase
 -

 Key: YARN-3256
 URL: https://issues.apache.org/jira/browse/YARN-3256
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3256.001.patch


 The test testClientTokenRace was not using the base class conf causing it to 
 run twice on the same Scheduler configured in the default.
 All tests deriving from ParameterizedSchedulerTestBase should use the conf 
 created in the base class instead of newing up inside the test and hiding the 
 member. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338428#comment-14338428
 ] 

Hudson commented on YARN-3239:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2048 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2048/])
YARN-3239. WebAppProxy does not support a final tracking url which has query 
fragments and params. Contributed by Jian He (jlowe: rev 
1a68fc43464d3948418f453bb2f80df7ce773097)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java


 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Fix For: 2.7.0

 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338453#comment-14338453
 ] 

Hudson commented on YARN-3256:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #107 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/107/])
YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against 
(devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java


 TestClientToAMTokens#testClientTokenRace is not running against all 
 Schedulers even when using ParameterizedSchedulerTestBase
 -

 Key: YARN-3256
 URL: https://issues.apache.org/jira/browse/YARN-3256
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3256.001.patch


 The test testClientTokenRace was not using the base class conf causing it to 
 run twice on the same Scheduler configured in the default.
 All tests deriving from ParameterizedSchedulerTestBase should use the conf 
 created in the base class instead of newing up inside the test and hiding the 
 member. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3260) NPE if AM attempts to register before RM processes launch event

2015-02-26 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338813#comment-14338813
 ] 

Naganarasimha G R commented on YARN-3260:
-

Hi [~jlowe],
Had a look at the code and some approaches which i can think of are :
* ApplicationMasterService.registerAppAttempt(ApplicationAttemptId) to be 
called in RMAppAttemptImpl.AMLaunchedTransition  instead of 
RMAppAttemptImpl.AttemptStartedTransition and ensuring that ClientToAMToken and 
registerering with ApplicationMasterService in the same block. By doing this we 
can throw InvalidApplicationMasterRequestException if AM tries to register to 
AMS before RMAppAttemptImpl processes RMAppAttempt LAUNCHED event.
* Was thinking of having MultiThreadedDispatcher for processing APP and 
AppAttempt events  similar to the one  in 
SystemMetricsPublisher.MultiThreadedDispatcher with additional modification 
that instead of having {{ (event.hashCode()  Integer.MAX_VALUE) % 
dispatchers.size();}} we can think of doing it based on applicationId. This 
can speed up the processing of App events ...

 Was not able to see any other cleaner direct fix for this issue, so was 
wondering whether we need to start looking at the reason for clusters was 
running behind on processing AsyncDispatcher events. Were these events were 
getting delayed to any particular reason? 

 NPE if AM attempts to register before RM processes launch event
 ---

 Key: YARN-3260
 URL: https://issues.apache.org/jira/browse/YARN-3260
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Naganarasimha G R

 The RM on one of our clusters was running behind on processing 
 AsyncDispatcher events, and this caused AMs to fail to register due to an 
 NPE.  The AM was launched and attempting to register before the 
 RMAppAttemptImpl had processed the LAUNCHED event, and the client to AM token 
 had not been generated yet.  The NPE occurred because the 
 ApplicationMasterService tried to encode the missing token.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338848#comment-14338848
 ] 

Craig Welch commented on YARN-3251:
---

Sorry if that wasn't clear, to reduce risk removed the minor changes in 
CSQueueUtils

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3255) RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support generic options

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338760#comment-14338760
 ] 

Hadoop QA commented on YARN-3255:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700931/YARN-3255-02.patch
  against trunk revision 773b651.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6757//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6757//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6757//console

This message is automatically generated.

 RM, NM, JobHistoryServer, and WebAppProxyServer's main() should support 
 generic options
 ---

 Key: YARN-3255
 URL: https://issues.apache.org/jira/browse/YARN-3255
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.5.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: YARN-3255-01.patch, YARN-3255-02.patch


 Currently {{ResourceManager.main()}} and {{NodeManager.main()}} ignore 
 generic options, like {{-conf}} and {{-fs}}. It would be good to have the 
 ability to pass generic options in order to specify configuration files or 
 the NameNode location, when the services start through {{main()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-02-26 Thread Prakash Ramachandran (JIRA)
Prakash Ramachandran created YARN-3267:
--

 Summary: Timelineserver applies the ACL rules after applying the 
limit on the number of records
 Key: YARN-3267
 URL: https://issues.apache.org/jira/browse/YARN-3267
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran


While fetching the entities from timelineserver, the limit is applied on the 
entities to be fetched from leveldb, the ACL filters are applied after this 
(TimelineDataManager.java::getEntities). 
this could mean that even if there are entities available which match the query 
criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338736#comment-14338736
 ] 

Ted Yu commented on YARN-3025:
--

Ping [~zjshen]

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt


 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration

2015-02-26 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2693:
--
Attachment: 0006-YARN-2693.patch

Attaching a minimal version of Application Priority manager where only 
configuration support is present. YARN-3250 on longer run will handle admin cli 
and REST support.

 Priority Label Manager in RM to manage application priority based on 
 configuration
 --

 Key: YARN-2693
 URL: https://issues.apache.org/jira/browse/YARN-2693
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 
 0006-YARN-2693.patch


 Focus of this JIRA is to have a centralized service to handle priority labels.
 Support operations such as
 * Add/Delete priority label to a specified queue
 * Manage integer mapping associated with each priority label
 * Support managing default priority label of a given queue
 * Expose interface to RM to validate priority label
 TO have simplified interface, Priority Manager will support only 
 configuration file in contrast with admin cli and REST. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338830#comment-14338830
 ] 

Karthik Kambatla commented on YARN-3231:


Thanks for reporting and working on this, [~l201514]. The approach looks 
generally good. Few comments (some nits):
# Rename {{updateRunnabilityonRefreshQueues}} to {{updateRunnabilityOnReload}}? 
And, add a javadoc for when it should be called and what it does.
# javadoc for the newly added private method and the significance of the new 
integer param.
# Call the above method from AllocationReloadListner#onReload after all the 
other queue configs are updated.
# The comment here no longer applies. Remove it? 
{code}
// No more than one app per list will be able to be made runnable, so
// we can stop looking after we've found that many
if (noLongerPendingApps.size() = maxRunnableApps) {
  break;
}
{code}
# Indentation:
{code}
updateAppsRunnability(appsNowMaybeRunnable,
appsNowMaybeRunnable.size());
{code}
# Newly added tests:
## If it is not too much trouble, can we move them to a new test class 
(TestAppRunnability?) mostly because TestFairScheduler has so many tests 
already. 
## Is it possible to reuse the code between these tests? 
## Should we add tests for when the maxRunnableApps for a user or queue is 
decreased? If you think this might need additional work in the logic as well, I 
am open to filing a follow up JIRA and addressing it there. 


 FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
 job stuck
 --

 Key: YARN-3231
 URL: https://issues.apache.org/jira/browse/YARN-3231
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch


 When a queue is piling up with a lot of pending jobs due to the 
 maxRunningApps limit. We want to increase this property on the fly to make 
 some of the pending job active. However, once we increase the limit, all 
 pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338746#comment-14338746
 ] 

Ted Yu commented on YARN-2777:
--

@Varun:
{code}
713   out.println(End of LogType:);
714   out.println(fileType);
{code}
Can you put the above two onto the same line ?

Thanks

 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Attachments: YARN-2777.001.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338782#comment-14338782
 ] 

Hadoop QA commented on YARN-3251:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12700977/YARN-3251.2-6-0.2.patch
  against trunk revision dce8b9c.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6759//console

This message is automatically generated.

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.

2015-02-26 Thread Prakash Ramachandran (JIRA)
Prakash Ramachandran created YARN-3268:
--

 Summary: timelineserver rest api returns html page for 404 when a 
bad endpoint is used.
 Key: YARN-3268
 URL: https://issues.apache.org/jira/browse/YARN-3268
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prakash Ramachandran


the timelineserver returns a 404 page instead of giving a REST response. this 
interferes with the end user pages which try to retrieve data using REST api. 
this could be due to lack of a 404 handler
ex. 
http://timelineserver:8188/badnamespace/v1/timeline/someentity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3251:
--
Attachment: YARN-3251.2-6-0.3.patch

Removing the csqueueutils

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338860#comment-14338860
 ] 

Wangda Tan commented on YARN-3251:
--

if opposite opinions - if no opposite opinions

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338858#comment-14338858
 ] 

Wangda Tan commented on YARN-3251:
--

LGTM +1, I will commit the patch to branch-2.6 this afternoon if opposite 
opinions. Thanks!

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3251:
--
Attachment: YARN-3251.2-6-0.4.patch

Minor, switch to Internal, seems to be more common in the codebase

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338945#comment-14338945
 ] 

Vinod Kumar Vavilapalli commented on YARN-3087:
---

bq. The current solution is a workaround for JAXB resolver, which cannot return 
an interface (Map) type. This work around is consistent with the v1 version of 
our ATS object model (in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/,
 such as TimelineEvent.java, TimelineEntity.java), fixed in YARN-2804. If it's 
needed, maybe we'd like to keep the declarations to be Map, and do the cast in 
the jaxb getter?
Casting it everytime will be expensive. Let's keep it as the patch currently 
does - we are not exposing the fact that it is a HashMap to external world, 
only to Jersey.

 [Aggregator implementation] the REST server (web server) for per-node 
 aggregator does not work if it runs inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch


 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338968#comment-14338968
 ] 

Varun Saxena commented on YARN-2777:


[~tedyu], made the change. Kindly review

 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Attachments: YARN-2777.001.patch, YARN-2777.002.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-02-26 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-3269:
---

 Summary: Yarn.nodemanager.remote-app-log-dir could not be 
configured to fully qualified path
 Key: YARN-3269
 URL: https://issues.apache.org/jira/browse/YARN-3269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong


Log aggregation currently is always relative to the default file system, not an 
arbitrary file system identified by URI. So we can't put an arbitrary 
fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339015#comment-14339015
 ] 

Junping Du commented on YARN-3087:
--

Agree with Vinod that if this is required from JAXB API then we don't have to 
cast it. Thanks [~gtCarrera9] for explanation on this!
Patch looks good to me in overall. One comments is: we have many similar logic 
to cast a MAP to HashMap like below:
{code}
-this.relatedEntities = relatedEntities;
+if (relatedEntities != null  !(relatedEntities instanceof HashMap)) {
+  this.relatedEntities = new HashMapString, SetString(relatedEntities);
+} else {
+  this.relatedEntities = (HashMapString, SetString) relatedEntities;
+}
{code}
May be we can use Generics to consolidate them.

 [Aggregator implementation] the REST server (web server) for per-node 
 aggregator does not work if it runs inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch


 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3025) Provide API for retrieving blacklisted nodes

2015-02-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339012#comment-14339012
 ] 

Vinod Kumar Vavilapalli commented on YARN-3025:
---

Coming in very late, apologies.

Some comments:
 - Echoing Bikas's first comment: Today the AMs are expected to maintain their 
own scheduling state. With this you are changing that - part of the scheduling 
state will be remembered but the remaining isn't. We should clearly draw a line 
somewhere, what is it?
 - [~zjshen] did a very good job of dividing the persistence concerns, but what 
is the guarantee that is given to the app writers? I'll return the list of 
blacklisted nodes whenever I can, but shoot I died, so I can't help you much 
is not going to cut it. If we want reliable notifications, we should build a 
protocol between AM and RM about the persistence of the blacklisted node list - 
too much of a complexity if you ask me. Why not leave it to the apps?
 - The blacklist information is per application-attempt, and scheduler will 
forget previous application-attempts today. So as I understand it, the patch 
doesn't work.

 Provide API for retrieving blacklisted nodes
 

 Key: YARN-3025
 URL: https://issues.apache.org/jira/browse/YARN-3025
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: yarn-3025-v1.txt, yarn-3025-v2.txt, yarn-3025-v3.txt


 We have the following method which updates blacklist:
 {code}
   public synchronized void updateBlacklist(ListString blacklistAdditions,
   ListString blacklistRemovals) {
 {code}
 Upon AM failover, there should be an API which returns the blacklisted nodes 
 so that the new AM can make consistent decisions.
 The new API can be:
 {code}
   public synchronized ListString getBlacklistedNodes()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339060#comment-14339060
 ] 

Zhijie Shen commented on YARN-3087:
---

Thanks for the patch, Li! Some detailed comments about the patch:

1. HierarchicalTimelineEntity is abstract, maybe not necessary.
{code}
// required by JAXB
HierarchicalTimelineEntity() {
  super();
}
{code}

2. Can we mark JAXB methods \@Private?

3. I think rootUnwrapping should be true to be consistent with 
YarnJacksonJaxbJsonProvider. It seems JAXBContextResolver is never used (I 
think the reason is that we are using YarnJacksonJaxbJsonProvider), maybe we 
want to remove the class.
{code}
this.context =
new JSONJAXBContext(JSONConfiguration.natural().rootUnwrapping(false)
.build(), cTypes)
{code}

4. Does it mean if we want to add a filter, we need to hard code here? So 
hadoop.http.filter.initializers no longer work? Is it possible to provide 
some similar mechanism to replace what hadoop.http.filter.initializers does 
if it doesn't work.
{code}
121   // TODO: replace this by an authentification filter in future.
122   HashMapString, String options = new HashMapString, String();
123   String username = conf.get(HADOOP_HTTP_STATIC_USER,
124   DEFAULT_HADOOP_HTTP_STATIC_USER);
125   options.put(HADOOP_HTTP_STATIC_USER, username);
126   HttpServer2.defineFilter(timelineRestServer.getWebAppContext(),
127   static_user_filter_timeline,
128   StaticUserWebFilter.StaticUserFilter.class.getName(),
129   options, new String[] {/*});
{code}

 [Aggregator implementation] the REST server (web server) for per-node 
 aggregator does not work if it runs inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch


 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.

2015-02-26 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li reassigned YARN-3268:
--

Assignee: Chang Li

 timelineserver rest api returns html page for 404 when a bad endpoint is used.
 --

 Key: YARN-3268
 URL: https://issues.apache.org/jira/browse/YARN-3268
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prakash Ramachandran
Assignee: Chang Li

 the timelineserver returns a 404 page instead of giving a REST response. this 
 interferes with the end user pages which try to retrieve data using REST api. 
 this could be due to lack of a 404 handler
 ex. 
 http://timelineserver:8188/badnamespace/v1/timeline/someentity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-02-26 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li reassigned YARN-3267:
--

Assignee: Chang Li

 Timelineserver applies the ACL rules after applying the limit on the number 
 of records
 --

 Key: YARN-3267
 URL: https://issues.apache.org/jira/browse/YARN-3267
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Prakash Ramachandran
Assignee: Chang Li

 While fetching the entities from timelineserver, the limit is applied on the 
 entities to be fetched from leveldb, the ACL filters are applied after this 
 (TimelineDataManager.java::getEntities). 
 this could mean that even if there are entities available which match the 
 query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-3231:
--
Attachment: YARN-3231.v3.patch

 FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
 job stuck
 --

 Key: YARN-3231
 URL: https://issues.apache.org/jira/browse/YARN-3231
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
 YARN-3231.v3.patch


 When a queue is piling up with a lot of pending jobs due to the 
 maxRunningApps limit. We want to increase this property on the fly to make 
 some of the pending job active. However, once we increase the limit, all 
 pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3268) timelineserver rest api returns html page for 404 when a bad endpoint is used.

2015-02-26 Thread Chang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated YARN-3268:
---
Assignee: (was: Chang Li)

 timelineserver rest api returns html page for 404 when a bad endpoint is used.
 --

 Key: YARN-3268
 URL: https://issues.apache.org/jira/browse/YARN-3268
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prakash Ramachandran

 the timelineserver returns a 404 page instead of giving a REST response. this 
 interferes with the end user pages which try to retrieve data using REST api. 
 this could be due to lack of a 404 handler
 ex. 
 http://timelineserver:8188/badnamespace/v1/timeline/someentity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339084#comment-14339084
 ] 

Hadoop QA commented on YARN-3269:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701154/YARN-3269.1.patch
  against trunk revision dce8b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6762//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6762//console

This message is automatically generated.

 Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
 qualified path
 ---

 Key: YARN-3269
 URL: https://issues.apache.org/jira/browse/YARN-3269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3269.1.patch


 Log aggregation currently is always relative to the default file system, not 
 an arbitrary file system identified by URI. So we can't put an arbitrary 
 fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3248) Display count of nodes blacklisted by apps in the web UI

2015-02-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338950#comment-14338950
 ] 

Vinod Kumar Vavilapalli commented on YARN-3248:
---

YARN-3025 is related to my first comment above.

 Display count of nodes blacklisted by apps in the web UI
 

 Key: YARN-3248
 URL: https://issues.apache.org/jira/browse/YARN-3248
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: Screenshot.jpg, apache-yarn-3248.0.patch


 It would be really useful when debugging app performance and failure issues 
 to get a count of the nodes blacklisted by individual apps displayed in the 
 web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2777:
---
Attachment: YARN-2777.002.patch

 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Attachments: YARN-2777.001.patch, YARN-2777.002.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338975#comment-14338975
 ] 

Ted Yu commented on YARN-2777:
--

lgtm

 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Attachments: YARN-2777.001.patch, YARN-2777.002.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3251:
--
Attachment: YARN-3251.2.patch

Attaching an analogue of the most recent patch against trunk.  I do not believe 
that we will be committing this at this point as [~leftnoteasy] is working on a 
more significant change which will remove the need for it, but I wanted to make 
it available just in case.  For clarity, patch against trunk is 
YARN-3251.2.patch and the patch to commit against 2.6 is 
YARN-3251.2-6-0.4.patch.

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2777) Mark the end of individual log in aggregated log

2015-02-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338967#comment-14338967
 ] 

Varun Saxena commented on YARN-2777:


[~tedyu], made the change. Kindly review

 Mark the end of individual log in aggregated log
 

 Key: YARN-2777
 URL: https://issues.apache.org/jira/browse/YARN-2777
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Varun Saxena
  Labels: log-aggregation
 Attachments: YARN-2777.001.patch, YARN-2777.002.patch


 Below is snippet of aggregated log showing hbase master log:
 {code}
 LogType: hbase-hbase-master-ip-172-31-34-167.log
 LogUploadTime: 29-Oct-2014 22:31:55
 LogLength: 24103045
 Log Contents:
 Wed Oct 29 15:43:57 UTC 2014 Starting master on ip-172-31-34-167
 ...
   at 
 org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
   at java.lang.Thread.run(Thread.java:745)
 LogType: hbase-hbase-master-ip-172-31-34-167.out
 {code}
 Since logs from various daemons are aggregated in one log file, it would be 
 desirable to mark the end of one log before starting with the next.
 e.g. with such a line:
 {code}
 End of LogType: hbase-hbase-master-ip-172-31-34-167.log
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338867#comment-14338867
 ] 

Li Lu commented on YARN-3087:
-

Hi [~djp], thanks for the feedback! I totally understand your concern here. The 
current solution is a workaround for JAXB resolver, which cannot return an 
interface (Map) type. This work around is consistent with the v1 version of our 
ATS object model (in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/,
 such as TimelineEvent.java, TimelineEntity.java), fixed in YARN-2804. If it's 
needed, maybe we'd like to keep the declarations to be Map, and do the cast in 
the jaxb getter? 

 [Aggregator implementation] the REST server (web server) for per-node 
 aggregator does not work if it runs inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch


 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2693) Priority Label Manager in RM to manage application priority based on configuration

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338914#comment-14338914
 ] 

Hadoop QA commented on YARN-2693:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701122/0006-YARN-2693.patch
  against trunk revision dce8b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1152 javac 
compiler warnings (more than the trunk's current 1151 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 7 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6758//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6758//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6758//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6758//console

This message is automatically generated.

 Priority Label Manager in RM to manage application priority based on 
 configuration
 --

 Key: YARN-2693
 URL: https://issues.apache.org/jira/browse/YARN-2693
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-2693.patch, 0002-YARN-2693.patch, 
 0003-YARN-2693.patch, 0004-YARN-2693.patch, 0005-YARN-2693.patch, 
 0006-YARN-2693.patch


 Focus of this JIRA is to have a centralized service to handle priority labels.
 Support operations such as
 * Add/Delete priority label to a specified queue
 * Manage integer mapping associated with each priority label
 * Support managing default priority label of a given queue
 * Expose interface to RM to validate priority label
 TO have simplified interface, Priority Manager will support only 
 configuration file in contrast with admin cli and REST. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-02-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-3269:

Attachment: YARN-3269.1.patch

 Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
 qualified path
 ---

 Key: YARN-3269
 URL: https://issues.apache.org/jira/browse/YARN-3269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3269.1.patch


 Log aggregation currently is always relative to the default file system, not 
 an arbitrary file system identified by URI. So we can't put an arbitrary 
 fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-02-26 Thread Rohit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Agarwal updated YARN-3270:

Attachment: YARN-3270.patch

Attached the patch.

 node label expression not getting set in ApplicationSubmissionContext
 -

 Key: YARN-3270
 URL: https://issues.apache.org/jira/browse/YARN-3270
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Priority: Minor
 Attachments: YARN-3270.patch


 One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
 setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3251) CapacityScheduler deadlock when computing absolute max avail capacity (short term fix for 2.6.1)

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339140#comment-14339140
 ] 

Hadoop QA commented on YARN-3251:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701150/YARN-3251.2.patch
  against trunk revision dce8b9c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6760//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6760//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6760//console

This message is automatically generated.

 CapacityScheduler deadlock when computing absolute max avail capacity (short 
 term fix for 2.6.1)
 

 Key: YARN-3251
 URL: https://issues.apache.org/jira/browse/YARN-3251
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Craig Welch
Priority: Blocker
 Attachments: YARN-3251.1.patch, YARN-3251.2-6-0.2.patch, 
 YARN-3251.2-6-0.3.patch, YARN-3251.2-6-0.4.patch, YARN-3251.2.patch


 The ResourceManager can deadlock in the CapacityScheduler when computing the 
 absolute max available capacity for user limits and headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-02-26 Thread Rohit Agarwal (JIRA)
Rohit Agarwal created YARN-3270:
---

 Summary: node label expression not getting set in 
ApplicationSubmissionContext
 Key: YARN-3270
 URL: https://issues.apache.org/jira/browse/YARN-3270
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Priority: Minor


One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3122:

Attachment: YARN-3122.003.patch

Addressed feedback

 Metrics for container's actual CPU usage
 

 Key: YARN-3122
 URL: https://issues.apache.org/jira/browse/YARN-3122
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
 YARN-3122.003.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track CPU usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-02-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339170#comment-14339170
 ] 

Vinod Kumar Vavilapalli commented on YARN-3269:
---

Can you modify one of the tests to use a fully qualified patch, in order to 
'prove' that this patch works?

 Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
 qualified path
 ---

 Key: YARN-3269
 URL: https://issues.apache.org/jira/browse/YARN-3269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-3269.1.patch


 Log aggregation currently is always relative to the default file system, not 
 an arbitrary file system identified by URI. So we can't put an arbitrary 
 fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3270) node label expression not getting set in ApplicationSubmissionContext

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339196#comment-14339196
 ] 

Hadoop QA commented on YARN-3270:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701163/YARN-3270.patch
  against trunk revision 2214dab.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6764//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6764//console

This message is automatically generated.

 node label expression not getting set in ApplicationSubmissionContext
 -

 Key: YARN-3270
 URL: https://issues.apache.org/jira/browse/YARN-3270
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Rohit Agarwal
Priority: Minor
 Attachments: YARN-3270.patch


 One of the {{newInstance}} methods in {{ApplicationSubmissionContext}} is not 
 setting the {{appLabelExpression}} passed to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339201#comment-14339201
 ] 

Hadoop QA commented on YARN-3231:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701160/YARN-3231.v3.patch
  against trunk revision f0c980a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6763//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6763//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6763//console

This message is automatically generated.

 FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
 job stuck
 --

 Key: YARN-3231
 URL: https://issues.apache.org/jira/browse/YARN-3231
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
 YARN-3231.v3.patch


 When a queue is piling up with a lot of pending jobs due to the 
 maxRunningApps limit. We want to increase this property on the fly to make 
 some of the pending job active. However, once we increase the limit, all 
 pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3231) FairScheduler changing queueMaxRunningApps on the fly will cause all pending job stuck

2015-02-26 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339214#comment-14339214
 ] 

Siqi Li commented on YARN-3231:
---

Hi [~ka...@cloudera.com], thanks for your feedback.

I have updated a new patch which addressed all your comment except 6.1 and 6.3.

For 6.1, it seems that there are other test cases that also might be qualified 
for moving to TestAppRunnability, it would be good to do a larger refactor of 
TestFairScheduler into TestAppRunnability.

For 6.3, I don't think there is a problem with maxRunnableApps for a user or 
queue is decreased. 

 FairScheduler changing queueMaxRunningApps on the fly will cause all pending 
 job stuck
 --

 Key: YARN-3231
 URL: https://issues.apache.org/jira/browse/YARN-3231
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-3231.v1.patch, YARN-3231.v2.patch, 
 YARN-3231.v3.patch


 When a queue is piling up with a lot of pending jobs due to the 
 maxRunningApps limit. We want to increase this property on the fly to make 
 some of the pending job active. However, once we increase the limit, all 
 pending jobs were not assigned any resource, and were stuck forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3087:

Attachment: YARN-3087-022615.patch

Updated my patch according to [~zjshen]'s comments. Addressed points 1-3. Point 
4 is caused by a limitation of HttpServer2 for now. We may want to decide if we 
want to fix that on our side, or add support to this use case on the 
HttpServer2 side. For now, I think we can temporarily use our current way to 
make the prototype work. 

 [Aggregator implementation] the REST server (web server) for per-node 
 aggregator does not work if it runs inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, 
 YARN-3087-022615.patch


 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339233#comment-14339233
 ] 

Li Lu commented on YARN-3087:
-

Hi [~djp], thanks for the comments! I agree that we may want to use generic 
types to solve the problem. Similar code also appear in v1 timeline object 
model, so maybe we'd like to fix both together? If that's the case we may open 
a separate JIRA to trace this. 

 [Aggregator implementation] the REST server (web server) for per-node 
 aggregator does not work if it runs inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, 
 YARN-3087-022615.patch


 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3122) Metrics for container's actual CPU usage

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339237#comment-14339237
 ] 

Hadoop QA commented on YARN-3122:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701167/YARN-3122.003.patch
  against trunk revision 2214dab.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6765//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6765//console

This message is automatically generated.

 Metrics for container's actual CPU usage
 

 Key: YARN-3122
 URL: https://issues.apache.org/jira/browse/YARN-3122
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3122.001.patch, YARN-3122.002.patch, 
 YARN-3122.003.patch, YARN-3122.prelim.patch, YARN-3122.prelim.patch


 It would be nice to capture resource usage per container, for a variety of 
 reasons. This JIRA is to track CPU usage. 
 YARN-2965 tracks the resource usage on the node, and the two implementations 
 should reuse code as much as possible. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3087) [Aggregator implementation] the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339248#comment-14339248
 ] 

Hadoop QA commented on YARN-3087:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12701178/YARN-3087-022615.patch
  against trunk revision c6d5b37.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6766//console

This message is automatically generated.

 [Aggregator implementation] the REST server (web server) for per-node 
 aggregator does not work if it runs inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: YARN-3087-022315.patch, YARN-3087-022515.patch, 
 YARN-3087-022615.patch


 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-26 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14339246#comment-14339246
 ] 

Zhijie Shen commented on YARN-3125:
---

Thanks for the patch, Junping! It looks good to me. Per offline discussion, we 
should add an integration test in TestDistributedShell.

 [Event producers] Change distributed shell to use new timeline service
 --

 Key: YARN-3125
 URL: https://issues.apache.org/jira/browse/YARN-3125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Junping Du
 Attachments: YARN-3125.patch, YARN-3125v2.patch, YARN-3125v3.patch


 We can start with changing distributed shell to use new timeline service once 
 the framework is completed, in which way we can quickly verify the next gen 
 is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-02-26 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-3266:

Attachment: YARN-3266.01.patch

 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Rohith
 Attachments: YARN-3266.01.patch


 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338153#comment-14338153
 ] 

Varun Saxena commented on YARN-3197:


I guess you mean no need for printing both.

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch, YARN-3197.002.patch, 
 YARN-3197.003.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-02-26 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338161#comment-14338161
 ] 

Chengbing Liu commented on YARN-3266:
-

uploaded a patch, taking over...

 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3266.01.patch


 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338170#comment-14338170
 ] 

Hadoop QA commented on YARN-2820:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700999/YARN-2820.005.patch
  against trunk revision 71385f9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 6 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6753//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6753//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6753//console

This message is automatically generated.

 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 --

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0, 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
 YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
 YARN-2820.005.patch


 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 

[jira] [Commented] (YARN-3197) Confusing log generated by CapacityScheduler

2015-02-26 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338154#comment-14338154
 ] 

Varun Saxena commented on YARN-3197:


Will change it back then. AppId was added to aid in quicker debugging 

 Confusing log generated by CapacityScheduler
 

 Key: YARN-3197
 URL: https://issues.apache.org/jira/browse/YARN-3197
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Assignee: Varun Saxena
Priority: Minor
 Attachments: YARN-3197.001.patch, YARN-3197.002.patch, 
 YARN-3197.003.patch


 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:39,968 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...
 2015-02-12 20:35:40,960 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(1190)) - Null container 
 completed...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-02-26 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu reassigned YARN-3266:
---

Assignee: Chengbing Liu  (was: Rohith)

 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3266.01.patch


 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3256) TestClientToAMToken#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase

2015-02-26 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338174#comment-14338174
 ] 

Devaraj K commented on YARN-3256:
-

+1, lgtm, will commit it shortly.

 TestClientToAMToken#testClientTokenRace is not running against all Schedulers 
 even when using ParameterizedSchedulerTestBase
 

 Key: YARN-3256
 URL: https://issues.apache.org/jira/browse/YARN-3256
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3256.001.patch


 The test testClientTokenRace was not using the base class conf causing it to 
 run twice on the same Scheduler configured in the default.
 All tests deriving from ParameterizedSchedulerTestBase should use the conf 
 created in the base class instead of newing up inside the test and hiding the 
 member. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase

2015-02-26 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated YARN-3256:

Summary: TestClientToAMTokens#testClientTokenRace is not running against 
all Schedulers even when using ParameterizedSchedulerTestBase  (was: 
TestClientToAMToken#testClientTokenRace is not running against all Schedulers 
even when using ParameterizedSchedulerTestBase)

 TestClientToAMTokens#testClientTokenRace is not running against all 
 Schedulers even when using ParameterizedSchedulerTestBase
 -

 Key: YARN-3256
 URL: https://issues.apache.org/jira/browse/YARN-3256
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3256.001.patch


 The test testClientTokenRace was not using the base class conf causing it to 
 run twice on the same Scheduler configured in the default.
 All tests deriving from ParameterizedSchedulerTestBase should use the conf 
 created in the base class instead of newing up inside the test and hiding the 
 member. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338195#comment-14338195
 ] 

Hudson commented on YARN-3256:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7207 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7207/])
YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against 
(devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java
* hadoop-yarn-project/CHANGES.txt


 TestClientToAMTokens#testClientTokenRace is not running against all 
 Schedulers even when using ParameterizedSchedulerTestBase
 -

 Key: YARN-3256
 URL: https://issues.apache.org/jira/browse/YARN-3256
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3256.001.patch


 The test testClientTokenRace was not using the base class conf causing it to 
 run twice on the same Scheduler configured in the default.
 All tests deriving from ParameterizedSchedulerTestBase should use the conf 
 created in the base class instead of newing up inside the test and hiding the 
 member. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338232#comment-14338232
 ] 

Hadoop QA commented on YARN-3266:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701009/YARN-3266.01.patch
  against trunk revision 166eecf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6754//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6754//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6754//console

This message is automatically generated.

 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3266.01.patch, YARN-3266.02.patch


 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3249) Add the kill application to the Resource Manager Web UI

2015-02-26 Thread Ryu Kobayashi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338243#comment-14338243
 ] 

Ryu Kobayashi commented on YARN-3249:
-

[~vinodkv] I see. Okay, I'll try to fix code it.

 Add the kill application to the Resource Manager Web UI
 ---

 Key: YARN-3249
 URL: https://issues.apache.org/jira/browse/YARN-3249
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0, 2.7.0
Reporter: Ryu Kobayashi
Assignee: Ryu Kobayashi
Priority: Minor
 Attachments: YARN-3249.2.patch, YARN-3249.2.patch, YARN-3249.3.patch, 
 YARN-3249.patch, killapp-failed.log, killapp-failed2.log, screenshot.png


 It want to kill the application on the JobTracker similarly Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-02-26 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-3266:

Attachment: YARN-3266.02.patch

Added a test in {{TestRMNodeTransitions}} to prevent regression.

 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3266.01.patch, YARN-3266.02.patch


 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338256#comment-14338256
 ] 

Hudson commented on YARN-3239:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #116 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/116/])
YARN-3239. WebAppProxy does not support a final tracking url which has query 
fragments and params. Contributed by Jian He (jlowe: rev 
1a68fc43464d3948418f453bb2f80df7ce773097)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* hadoop-yarn-project/CHANGES.txt


 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Fix For: 2.7.0

 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338264#comment-14338264
 ] 

Hudson commented on YARN-3256:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #116 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/116/])
YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against 
(devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java


 TestClientToAMTokens#testClientTokenRace is not running against all 
 Schedulers even when using ParameterizedSchedulerTestBase
 -

 Key: YARN-3256
 URL: https://issues.apache.org/jira/browse/YARN-3256
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3256.001.patch


 The test testClientTokenRace was not using the base class conf causing it to 
 run twice on the same Scheduler configured in the default.
 All tests deriving from ParameterizedSchedulerTestBase should use the conf 
 created in the base class instead of newing up inside the test and hiding the 
 member. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-02-26 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-3266:


Assignee: Rohith

 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Rohith

 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-02-26 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338106#comment-14338106
 ] 

Rohith commented on YARN-3266:
--

bq. the key string should include the NM's port as well
This make sense to me instead of changing API. Taking over now, feel free to 
assign yourself if you have already started working on this.

 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu

 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2820) Do retry in FileSystemRMStateStore for better error recovery when update/store failure due to IOException.

2015-02-26 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2820:

Attachment: YARN-2820.005.patch

 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 --

 Key: YARN-2820
 URL: https://issues.apache.org/jira/browse/YARN-2820
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0, 2.6.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2820.000.patch, YARN-2820.001.patch, 
 YARN-2820.002.patch, YARN-2820.003.patch, YARN-2820.004.patch, 
 YARN-2820.005.patch


 Do retry in FileSystemRMStateStore for better error recovery when 
 update/store failure due to IOException.
 When we use FileSystemRMStateStore as yarn.resourcemanager.store.class, We 
 saw the following IOexception cause the RM shutdown.
 {code}
 2014-10-29 23:49:12,202 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Updating info for attempt: appattempt_1409135750325_109118_01 at: 
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01
 2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not 
 complete
 /tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
 appattempt_1409135750325_109118_01.new.tmp retrying...
 2014-10-29 23:49:46,283 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
 Error updating info for attempt: appattempt_1409135750325_109118_01
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas.
 2014-10-29 23:49:46,284 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
 Error storing/updating appAttempt: appattempt_1409135750325_109118_01
 2014-10-29 23:49:46,916 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
 Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause: 
 java.io.IOException: Unable to close file because the last block does not 
 have enough number of replicas. 
 at 
 org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:2132)
  
 at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:2100) 
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70)
  
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) 
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.writeFile(FileSystemRMStateStore.java:522)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateFile(FileSystemRMStateStore.java:534)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationAttemptStateInternal(FileSystemRMStateStore.java:389)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
  
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
  
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
 at java.lang.Thread.run(Thread.java:744) 
 {code}
 As discussed at YARN-1778, TestFSRMStateStore failure is also due to  
 IOException in storeApplicationStateInternal.
 Stack trace from TestFSRMStateStore failure:
 {code}
  2015-02-03 00:09:19,092 INFO  [Thread-110] recovery.TestFSRMStateStore 
 (TestFSRMStateStore.java:run(285)) - testFSRMStateStoreClientRetry: Exception
  org.apache.hadoop.ipc.RemoteException(java.io.IOException): NameNode still 
 not started
at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.checkNNStartup(NameNodeRpcServer.java:1876)
at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:971)
at 
 

[jira] [Created] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-02-26 Thread Chengbing Liu (JIRA)
Chengbing Liu created YARN-3266:
---

 Summary: RMContext inactiveNodes should have NodeId as map key
 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu


Under the default NM port configuration, which is 0, we have observed in the 
current version, lost nodes count is greater than the length of the lost node 
list. This will happen when we consecutively restart the same NM twice:
* NM started at port 10001
* NM restarted at port 10002
* NM restarted at port 10003
* NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
{{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
{{inactiveNodes}} has 1 element
* NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
{{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
{{inactiveNodes}} still has 1 element

Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
{{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If this 
will break the current API, then the key string should include the NM's port as 
well.

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3239) WebAppProxy does not support a final tracking url which has query fragments and params

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338282#comment-14338282
 ] 

Hudson commented on YARN-3239:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #850 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/850/])
YARN-3239. WebAppProxy does not support a final tracking url which has query 
fragments and params. Contributed by Jian He (jlowe: rev 
1a68fc43464d3948418f453bb2f80df7ce773097)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestWebAppProxyServlet.java


 WebAppProxy does not support a final tracking url which has query fragments 
 and params 
 ---

 Key: YARN-3239
 URL: https://issues.apache.org/jira/browse/YARN-3239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jian He
 Fix For: 2.7.0

 Attachments: YARN-3239.1.patch


 Examples of failures:
 Expected: 
 {{http://uihost:8080/#/main/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424384418229_0005}}
 Actual: {{http://uihost:8080}}
 Tried with a minor change to remove the #. Saw a different issue:
 Expected: 
 {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez?viewPath=%2F%23%2Ftez-app%2Fapplication_1424388018547_0001}}
 Actual: {{http://uihost:8080/views/TEZ/0.5.2.2.2.2.0-947/tez/}}
 yarn application -status appId returns the expected value correctly. However, 
 invoking an http get on http://rm:8088/proxy/appId/ returns the wrong value. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3256) TestClientToAMTokens#testClientTokenRace is not running against all Schedulers even when using ParameterizedSchedulerTestBase

2015-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338290#comment-14338290
 ] 

Hudson commented on YARN-3256:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #850 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/850/])
YARN-3256. TestClientToAMTokens#testClientTokenRace is not running against 
(devaraj: rev 0d4296f0e0f545267f2e39a868d4ffefc9844db8)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java


 TestClientToAMTokens#testClientTokenRace is not running against all 
 Schedulers even when using ParameterizedSchedulerTestBase
 -

 Key: YARN-3256
 URL: https://issues.apache.org/jira/browse/YARN-3256
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.7.0

 Attachments: YARN-3256.001.patch


 The test testClientTokenRace was not using the base class conf causing it to 
 run twice on the same Scheduler configured in the default.
 All tests deriving from ParameterizedSchedulerTestBase should use the conf 
 created in the base class instead of newing up inside the test and hiding the 
 member. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-02-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338318#comment-14338318
 ] 

Hadoop QA commented on YARN-3266:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701022/YARN-3266.02.patch
  against trunk revision 0d4296f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6755//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6755//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6755//console

This message is automatically generated.

 RMContext inactiveNodes should have NodeId as map key
 -

 Key: YARN-3266
 URL: https://issues.apache.org/jira/browse/YARN-3266
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-3266.01.patch, YARN-3266.02.patch


 Under the default NM port configuration, which is 0, we have observed in the 
 current version, lost nodes count is greater than the length of the lost 
 node list. This will happen when we consecutively restart the same NM twice:
 * NM started at port 10001
 * NM restarted at port 10002
 * NM restarted at port 10003
 * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} has 1 element
 * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
 {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
 {{inactiveNodes}} still has 1 element
 Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
 {{inactiveNodes}} should be of type {{ConcurrentMapNodeId, RMNode}}. If 
 this will break the current API, then the key string should include the NM's 
 port as well.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >