date:20150123


 [ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3091:
-
Issue Type: Task  (was: Improvement)

 [Umbrella] Improve and fix locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3091) [Umbrella] Improve and fix locks of RM scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3091:
-
Summary: [Umbrella] Improve and fix locks of RM scheduler  (was: [Umbrella] 
Improve locks of RM scheduler)

 [Umbrella] Improve and fix locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3095) Enable DockerContainerExecutor to update Docker image

2015-01-23 Thread Chen He (JIRA)

Chen He created YARN-3095:
-

 Summary: Enable DockerContainerExecutor to update Docker image
 Key: YARN-3095
 URL: https://issues.apache.org/jira/browse/YARN-3095
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Chen He
Assignee: Chen He


This JIRA allows DCE to check and update docker image before running a 
container. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3087) the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-01-23 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289567#comment-14289567
 ] 

Zhijie Shen commented on YARN-3087:
---

It's known issue for long time. I found the ticket that reports this problem 
before: YARN-1142

 the REST server (web server) for per-node aggregator does not work if it runs 
 inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee

 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1142) MiniYARNCluster web ui does not work properly

2015-01-23 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289569#comment-14289569
 ] 

Zhijie Shen commented on YARN-1142:
---

Hi [~tucu00], did you have a chance to find where the exact singleton is?

 MiniYARNCluster web ui does not work properly
 -

 Key: YARN-1142
 URL: https://issues.apache.org/jira/browse/YARN-1142
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.7.0


 When going to the RM http port, the NM web ui is displayed. It seems there is 
 a singleton somewhere that breaks things when RM  NMs run in the same 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2868) Add metric for initial container launch time


[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289591#comment-14289591
 ] 

Ray Chiang commented on YARN-2868:
--

Okay, my bad.  I'll put it back the way it was.

 Add metric for initial container launch time
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-01-23 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289619#comment-14289619
 ] 

Sunil G commented on YARN-1963:
---

As per discussion happened in YARN-2896 with [~eepayne] and [~leftnoteasy], 
there is  proposal to use Integer alone as priority from client and as well as 
in server. As per design doc, a priority label was used as wrapper for user and 
internally server was using corresponding integer with same. We can continue 
discussion on this here in parent JIRA. Looping [~vinodkv].

Current idea:
{noformat}
yarn.prority-labels = low:2, medium:4, high:6
{noformat}

Proposed:
{noformat}
yarn.application.priority = 2, 3 , 4
{noformat}

Thank you for sharing your thoughts. I will now upload scheduler changes which 
can be reviewed meantime.

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster

2015-01-23 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289546#comment-14289546
 ] 

Robert Metzger commented on YARN-3086:
--

It seems that even on trunk tests are failing in the 
hadoop-yarn-server-resourcemanager package. Looks like its pretty hard to 
verify if my change is breaking anything.
I'm uploading an updated patch in a few hours...
{code}
Failed tests: 
  TestAMRestart.testRMAppAttemptFailuresValidityInterval:630 AppAttempt state 
is not correct (timedout) expected:ALLOCATED but was:SCHEDULED
  TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry:405 AppAttempt state 
is not correct (timedout) expected:ALLOCATED but was:SCHEDULED
  
TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort:316-checkShortCircuitRenewCancel:363
 expected:getProxy but was:null
  
TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort:327-checkShortCircuitRenewCancel:363
 expected:getProxy but was:null
  
TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort:305-checkShortCircuitRenewCancel:363
 expected:getProxy but was:null
  TestRMRestart.testQueueMetricsOnRMRestart:1812-assertQueueMetrics:1837 
expected:2 but was:1
  TestRMRestart.testRMRestartGetApplicationList:965 
Wanted but not invoked:
rMAppManager.logApplicationSummary(
isA(org.apache.hadoop.yarn.api.records.ApplicationId)
);
- at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:965)

However, there were other interactions with this mock:
- at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1188)
- at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1188)
- at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1188)
- at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1188)

  
TestContainerResourceUsage.testUsageAfterAMRestartWithMultipleContainers:252-amRestartTests:393
 Unexcpected MemorySeconds value expected:-1456158548889 but was:3265

Tests in error: 
  
TestClientRMTokens.testShortCircuitRenewCancel:285-checkShortCircuitRenewCancel:353
 » NullPointer
  
TestClientRMTokens.testShortCircuitRenewCancelWildcardAddress:294-checkShortCircuitRenewCancel:353
 » NullPointer
  TestAMAuthorization.testUnauthorizedAccess:273 » UnknownHost Invalid host 
name...
  TestAMAuthorization.testUnauthorizedAccess:273 » UnknownHost Invalid host 
name...
{code}

 Make NodeManager memory configurable in MiniYARNCluster
 ---

 Key: YARN-3086
 URL: https://issues.apache.org/jira/browse/YARN-3086
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Robert Metzger
Priority: Minor
 Attachments: YARN-3086.patch


 Apache Flink has a build-in YARN client to deploy it to YARN clusters.
 Recently, we added more tests for the client, using the MiniYARNCluster.
 One of the tests is requesting more containers than available. This test 
 works well on machines with enough memory, but on travis-ci (our test 
 environment), the available main memory is limited to 3 GB. 
 Therefore, I want to set custom amount of memory for each NodeManager.
 Right now, the NodeManager memory is hardcoded to 4GB.
 As discussed on the yarn-dev list, I'm going to create a patch for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3091) [Umbrella] Improve and fix locks of RM scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289552#comment-14289552
 ] 

Wangda Tan commented on YARN-3091:
--

Thanks for jumping in and provide your thoughts. [~gtCarrera], [~sunilg], 
[~ozawa], [~rohithsharma], [~varun_saxena].

I've just updated title of this JIRA a little bit according to suggestions from 
[~gtCarrera]. I think it's better to put improvement and fix together in this 
ticket. Since they share a lot of background works. And +1 to fix bugs prior to 
improvements, but it is possible we can address both of them at some places.

I agree to run Jcarder first to pinpoint problems first, with that, we can get 
some valid inputs. But I'm not sure what's the plan of HADOOP-9213, if it needs 
take more time, we can do some works our side parallelly.

 [Umbrella] Improve and fix locks of RM scheduler
 

 Key: YARN-3091
 URL: https://issues.apache.org/jira/browse/YARN-3091
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler, fairscheduler, resourcemanager, 
 scheduler
Reporter: Wangda Tan

 In existing YARN RM scheduler, there're some issues of using locks. For 
 example:
 - Many unnecessary synchronized locks, we have seen several cases recently 
 that too frequent access of scheduler makes scheduler hang. Which could be 
 addressed by using read/write lock. Components include scheduler, CS queues, 
 apps
 - Some fields not properly locked (Like clusterResource)
 We can address them together in this ticket.
 (More details see comments below)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

[
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289603#comment-14289603
]

Wangda Tan commented on YARN-2800:
--

Thanks review from [~vinodkv] and commit by [~ozawa].

Remove MemoryNodeLabelsStore and add a way to enable/disable node labels
feature

Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch,
YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch,
YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch,
YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch,
YARN-2800-20150122-1.patch

In the past, we have a MemoryNodeLabelStore, mostly for user to try this
feature without configuring where to store node labels on file system. It
seems convenient for user to try this, but actually it causes some bad use
experience. User may add/remove labels, and edit capacity-scheduler.xml.
After RM restart, labels will gone, (we store it in mem). And RM cannot get
started if we have some queue uses labels, and the labels don't exist in
cluster.
As what we discussed, we should have an explicitly way to let user specify if
he/she wants this feature or not. If node label is disabled, any operations
trying to modify/use node labels will throw exception.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3087) the REST server (web server) for per-node aggregator does not work if it runs inside node manager

2015-01-23 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289779#comment-14289779
 ] 

Sangjin Lee commented on YARN-3087:
---

Thanks for that Zhijie.

 the REST server (web server) for per-node aggregator does not work if it runs 
 inside node manager
 -

 Key: YARN-3087
 URL: https://issues.apache.org/jira/browse/YARN-3087
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee

 This is related to YARN-3030. YARN-3030 sets up a per-node timeline 
 aggregator and the associated REST server. It runs fine as a standalone 
 process, but does not work if it runs inside the node manager due to possible 
 collisions of servlet mapping.
 Exception:
 {noformat}
 org.apache.hadoop.yarn.webapp.WebAppException: /v2/timeline: controller for 
 v2 not found
   at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)
   at org.apache.hadoop.yarn.webapp.Router.resolve(Router.java:140)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:134)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2868) Add metric for initial container launch time to FairScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Summary: Add metric for initial container launch time to FairScheduler  
(was: Add metric for initial container launch time)

 Add metric for initial container launch time to FairScheduler
 -

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289835#comment-14289835
 ] 

Ray Chiang commented on YARN-2868:
--

[~rohithsharma], it looks like CapacityScheduler/AbstractYarnScheduler is 
missing a couple of things needed to record container launch time.  The Clock 
stuff is easy to add, but the queue related stuff looks like it could get 
complicated.  I think I'd rather wait for YARN-2986 before this JIRA is 
implemented in CapacityScheduler.  I can open a new JIRA for that once this one 
is done.

Does that sound reasonable?

 Add metric for initial container launch time to FairScheduler
 -

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3081) Potential indefinite wait in ContainerManagementProtocolProxy#addProxyToCache()


[ 
https://issues.apache.org/jira/browse/YARN-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289773#comment-14289773
 ] 

Tsuyoshi OZAWA commented on YARN-3081:
--

Thank you, Ted!

 Potential indefinite wait in 
 ContainerManagementProtocolProxy#addProxyToCache()
 ---

 Key: YARN-3081
 URL: https://issues.apache.org/jira/browse/YARN-3081
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: yarn-3081-001.patch


 {code}
   if (!removedProxy) {
 // all of the proxies are currently in use and already scheduled
 // for removal, so we need to wait until at least one of them closes
 try {
   this.wait();
 {code}
 The above code can wait for a condition that has already been satisfied, 
 leading to indefinite wait.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289832#comment-14289832
 ] 

Wangda Tan commented on YARN-1963:
--

Thanks for summary from [~sunilg],  I think priority should be a range instead 
of a set of numbers, may be we can refer to how linux do it, the range \[-N, 
+N], and 0 is default priority.

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3028) Better syntax for replace label CLI


[ 
https://issues.apache.org/jira/browse/YARN-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289846#comment-14289846
 ] 

Wangda Tan commented on YARN-3028:
--

Patch LGTM, will commit this week. 

Thanks,

 Better syntax for replace label CLI
 ---

 Key: YARN-3028
 URL: https://issues.apache.org/jira/browse/YARN-3028
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Jian He
Assignee: Rohith
 Attachments: 0001-YARN-3028.patch, 0002-YARN-3028.patch


 The command to replace label now is such:
 {code}
 yarn rmadmin -replaceLabelsOnNode [node1:port,label1,label2 
 node2:port,label1,label2]
 {code}
 Instead of {code} node1:port,label1,label2 {code} I think it's better to say 
 {code} node1:port=label1,label2 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly


[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289769#comment-14289769
 ] 

Hadoop QA commented on YARN-3021:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694177/YARN-3021.001.patch
  against trunk revision 24aa462.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 13 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapreduce.TestLargeSort
  org.apache.hadoop.conf.TestJobConf
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6398//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6398//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6398//console

This message is automatically generated.

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3088) LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error

2015-01-23 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-3088:
-
Attachment: YARN-3088.v1.txt

[~jlowe], would you please take a look at this patch?

 LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns 
 an error
 -

 Key: YARN-3088
 URL: https://issues.apache.org/jira/browse/YARN-3088
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe
Assignee: Eric Payne
 Attachments: YARN-3088.v1.txt


 If the native executor returns an error trying to delete a path as a 
 particular user when dir==null then the code can NPE trying to build a log 
 message for the error.  It blindly deferences dir in the log message despite 
 the code just above explicitly handling the cases when dir could be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-650) User guide for preemption

2015-01-23 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-650:
--
Fix Version/s: (was: 2.7.0)

 User guide for preemption
 -

 Key: YARN-650
 URL: https://issues.apache.org/jira/browse/YARN-650
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Chris Douglas
Priority: Minor
 Attachments: Y650-0.patch


 YARN-45 added a protocol for the RM to ask back resources. The docs on 
 writing YARN applications should include a section on how to interpret this 
 message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2868) Add metric for initial container launch time to FairScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Attachment: YARN-2868.007.patch

- Move metric to QueueMetrics parent class (to be compatible with 
CapacityScheduler later)
- Remove initialized boolean variable
- Restore AtomicLong to SchedulerApplicationAttempt

 Add metric for initial container launch time to FairScheduler
 -

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle

2015-01-23 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289974#comment-14289974
 ] 

Zhijie Shen commented on YARN-3030:
---

I'm not sure if we will have quick solution for YARN-3087. I'm okay if we want 
to work around to put the web service module in the existing webapp. I think we 
can make per-node aggregator as a singleton because an NM will just have one. 
In this way, we can easily refer to it in different places of NM. Thoughts?



 set up ATS writer with basic request serving structure and lifecycle
 

 Key: YARN-3030
 URL: https://issues.apache.org/jira/browse/YARN-3030
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: YARN-3030.001.patch, YARN-3030.002.patch, 
 YARN-3030.003.patch


 Per design in YARN-2928, create an ATS writer as a service, and implement the 
 basic service structure including the lifecycle management.
 Also, as part of this JIRA, we should come up with the ATS client API for 
 sending requests to this ATS writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.

2015-01-23 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290042#comment-14290042
 ] 

Anubhav Dhoot commented on YARN-3079:
-

Should we add a couple of more combinations to the test to ensure coverage?
Update node2 where it increases resources but the increase is less than current 
max and see no change 
Update node2 where it decreases resources but the original and new are both 
less than the current max and see no change?

 Scheduler should also update maximumAllocation when updateNodeResource.
 ---

 Key: YARN-3079
 URL: https://issues.apache.org/jira/browse/YARN-3079
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3079.000.patch, YARN-3079.001.patch


 Scheduler should also update maximumAllocation when updateNodeResource. 
 Otherwise even the node resource is changed by 
 AdminService#updateNodeResource, maximumAllocation won't be changed.
 Also RMNodeReconnectEvent called from 
 ResourceTrackerService#registerNodeManager will also trigger 
 AbstractYarnScheduler#updateNodeResource being called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3092:
-
Attachment: YARN-3092.1.patch

Updated ver.1 patch.

 Create common resource usage class to track labeled resource/capacity in 
 Capacity Scheduler
 ---

 Key: YARN-3092
 URL: https://issues.apache.org/jira/browse/YARN-3092
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3092.1.patch


 Since we have labels on nodes, so we need to track resource usage *by 
 labels*, includes
 - AM resource (to enforce max-am-resource-by-label after YARN-2637)
 - Used resource (includes AM resource usage)
 - Reserved resource
 - Pending resource
 - Headroom
 Benefits to have such a common class are:
 - Reuse lots of code in different places (Queue/App/User), better 
 maintainability and readability.
 - Can make fine-grained locking (e.g. accessing used resource in a queue 
 doesn't need lock a queue)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization


[ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290163#comment-14290163
 ] 

Jian He commented on YARN-3011:
---

lgtm overall, IIUC, if {{yarn.dispatcher.exit-on-error}} is set to false, NM 
will not crash in this case?
one nit on the patch:
{{next.getResource().getFile()}} , I feel using 
{{ConverterUtils#getPathFromYarnURL}} to print the full URL will be more 
debuggable. 

 NM dies because of the failure of resource localization
 ---

 Key: YARN-3011
 URL: https://issues.apache.org/jira/browse/YARN-3011
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao
Assignee: Varun Saxena
 Attachments: YARN-3011.001.patch


 NM dies because of IllegalArgumentException when localize resource.
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
  1416997035456, FILE, null }
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
  1419831474153, FILE, null }
 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Error in dispatcher thread
 java.lang.IllegalArgumentException: Can not create a Path from an empty string
 at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
 at org.apache.hadoop.fs.Path.init(Path.java:135)
 at org.apache.hadoop.fs.Path.init(Path.java:94)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
   
 at java.lang.Thread.run(Thread.java:745)
 2014-12-29 13:43:58,701 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user hadoop
 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Exiting, bbye..
 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
 connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290208#comment-14290208
 ] 

Hadoop QA commented on YARN-3092:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694258/YARN-3092.1.patch
  against trunk revision 5c93ca2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6399//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6399//console

This message is automatically generated.

 Create common resource usage class to track labeled resource/capacity in 
 Capacity Scheduler
 ---

 Key: YARN-3092
 URL: https://issues.apache.org/jira/browse/YARN-3092
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3092.1.patch


 Since we have labels on nodes, so we need to track resource usage *by 
 labels*, includes
 - AM resource (to enforce max-am-resource-by-label after YARN-2637)
 - Used resource (includes AM resource usage)
 - Reserved resource
 - Pending resource
 - Headroom
 Benefits to have such a common class are:
 - Reuse lots of code in different places (Queue/App/User), better 
 maintainability and readability.
 - Can make fine-grained locking (e.g. accessing used resource in a queue 
 doesn't need lock a queue)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery


[ 
https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289200#comment-14289200
 ] 

Hadoop QA commented on YARN-3094:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694133/YARN-3094.patch
  against trunk revision 3aab354.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6397//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6397//console

This message is automatically generated.

 reset timer for liveness monitors after RM recovery
 ---

 Key: YARN-3094
 URL: https://issues.apache.org/jira/browse/YARN-3094
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3094.patch


 When RM restarts, it will recover RMAppAttempts and registry them to 
 AMLivenessMonitor if they are not in final state. AM will time out in RM if 
 the recover process takes long time due to some reasons(e.g. too many apps). 
 In our system, we found the recover process took about 3 mins, and all AM 
 time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively

2015-01-23 Thread Chun Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290363#comment-14290363
 ] 

Chun Chen commented on YARN-3077:
-

Thanks for reviewing the patch, [~jianhe]. upload a new patch addressing your 
comments.

 RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
 

 Key: YARN-3077
 URL: https://issues.apache.org/jira/browse/YARN-3077
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chun Chen
 Attachments: YARN-3077.2.patch, YARN-3077.patch


 If multiple clusters share a zookeeper cluster, users might use 
 /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user 
 specified a customer value which is not a top-level path for 
 ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent 
 path first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster


[ 
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290384#comment-14290384
 ] 

Hadoop QA commented on YARN-3086:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694155/YARN-3086.patch
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6404//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6404//console

This message is automatically generated.

 Make NodeManager memory configurable in MiniYARNCluster
 ---

 Key: YARN-3086
 URL: https://issues.apache.org/jira/browse/YARN-3086
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Robert Metzger
Priority: Minor
 Attachments: YARN-3086.patch


 Apache Flink has a build-in YARN client to deploy it to YARN clusters.
 Recently, we added more tests for the client, using the MiniYARNCluster.
 One of the tests is requesting more containers than available. This test 
 works well on machines with enough memory, but on travis-ci (our test 
 environment), the available main memory is limited to 3 GB. 
 Therefore, I want to set custom amount of memory for each NodeManager.
 Right now, the NodeManager memory is hardcoded to 4GB.
 As discussed on the yarn-dev list, I'm going to create a patch for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively


[ 
https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290232#comment-14290232
 ] 

Jian He commented on YARN-3077:
---

[~chenchun],  thanks for working on this.
The newly added test is passing without the patch change. mind taking a deeper 
look ?
For test case: I suggest changing  ZK_RM_STATE_STORE_PARENT_PATH to use 
/foo/bar for the existing test cases, instead of adding a new unit test to 
test the until method.

 RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
 

 Key: YARN-3077
 URL: https://issues.apache.org/jira/browse/YARN-3077
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chun Chen
 Attachments: YARN-3077.patch


 If multiple clusters share a zookeeper cluster, users might use 
 /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user 
 specified a customer value which is not a top-level path for 
 ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent 
 path first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3092:
-
Attachment: YARN-3092.2.patch

Thanks review from [~jianhe], updated patch addressed all comments.

 Create common resource usage class to track labeled resource/capacity in 
 Capacity Scheduler
 ---

 Key: YARN-3092
 URL: https://issues.apache.org/jira/browse/YARN-3092
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3092.1.patch, YARN-3092.2.patch


 Since we have labels on nodes, so we need to track resource usage *by 
 labels*, includes
 - AM resource (to enforce max-am-resource-by-label after YARN-2637)
 - Used resource (includes AM resource usage)
 - Reserved resource
 - Pending resource
 - Headroom
 Benefits to have such a common class are:
 - Reuse lots of code in different places (Queue/App/User), better 
 maintainability and readability.
 - Can make fine-grained locking (e.g. accessing used resource in a queue 
 doesn't need lock a queue)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2015-01-23 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290300#comment-14290300
 ] 

Eric Payne commented on YARN-1963:
--

+1 on using numbers and not labels. It seems that the use of labels adds more 
complexity in mapping, sending via PB, and converting back to numbers, and does 
not seem to add much clarity.

 Support priorities across applications within the same queue 
 -

 Key: YARN-1963
 URL: https://issues.apache.org/jira/browse/YARN-1963
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Reporter: Arun C Murthy
Assignee: Sunil G
 Attachments: YARN Application Priorities Design.pdf, YARN Application 
 Priorities Design_01.pdf


 It will be very useful to support priorities among applications within the 
 same queue, particularly in production scenarios. It allows for finer-grained 
 controls without having to force admins to create a multitude of queues, plus 
 allows existing applications to continue using existing queues which are 
 usually part of institutional memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively


[ 
https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290406#comment-14290406
 ] 

Hadoop QA commented on YARN-3077:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694322/YARN-3077.2.patch
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6405//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6405//console

This message is automatically generated.

 RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
 

 Key: YARN-3077
 URL: https://issues.apache.org/jira/browse/YARN-3077
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chun Chen
 Attachments: YARN-3077.2.patch, YARN-3077.patch


 If multiple clusters share a zookeeper cluster, users might use 
 /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user 
 specified a customer value which is not a top-level path for 
 ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent 
 path first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler

2015-01-23 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290420#comment-14290420
 ] 

Anubhav Dhoot commented on YARN-2868:
-

Adding it to ClusterMetrics will only give you a single value for the entire 
cluster which is pretty much useless if you want to investigate queue related 
issues. Adding it to a per queue metrics will give you more granular data. If 
you only care about the cluster wide metrics you still get that by looking at 
the root queue metrics. Hence we need to keep it per queue. All clustermetrics 
that are related to a queue should be moved to per queue metrics. I will open 
other jiras for moving those

 Add metric for initial container launch time to FairScheduler
 -

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively


[ 
https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290235#comment-14290235
 ] 

Jian He commented on YARN-3077:
---

doing above, the createRootDirRecursively visibility can be changed to private

 RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
 

 Key: YARN-3077
 URL: https://issues.apache.org/jira/browse/YARN-3077
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chun Chen
 Attachments: YARN-3077.patch


 If multiple clusters share a zookeeper cluster, users might use 
 /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user 
 specified a customer value which is not a top-level path for 
 ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent 
 path first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290242#comment-14290242
 ] 

Jian He commented on YARN-3092:
---

looks good overall.
- minor optimization, return the reference directly if not existing
{code}
if (!usages.containsKey(label)) {
  usages.put(label, new UsageByLabel(label));
}
return usages.get(label);
{code}
- demand - pending?

- test case, just throw exception for better readability.
{code}
NoSuchMethodException,
  SecurityException, IllegalAccessException, IllegalArgumentException,
  InvocationTargetException
{code}
- the new class can be inside scheduler package

 Create common resource usage class to track labeled resource/capacity in 
 Capacity Scheduler
 ---

 Key: YARN-3092
 URL: https://issues.apache.org/jira/browse/YARN-3092
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3092.1.patch


 Since we have labels on nodes, so we need to track resource usage *by 
 labels*, includes
 - AM resource (to enforce max-am-resource-by-label after YARN-2637)
 - Used resource (includes AM resource usage)
 - Reserved resource
 - Pending resource
 - Headroom
 Benefits to have such a common class are:
 - Reuse lots of code in different places (Queue/App/User), better 
 maintainability and readability.
 - Can make fine-grained locking (e.g. accessing used resource in a queue 
 doesn't need lock a queue)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290272#comment-14290272
 ] 

Wangda Tan commented on YARN-2868:
--

Hmm, I think it may not good enough to put in QueueMetrics (I just noticed 
this). Every new app will overwrite this value, which is confusing to me, also 
to end users. When you look at metrics fields in QueueMetrics, all of them are 
generic metrics of a queue, but this field seems not so generic to me.

Is there any must-have reason or use cases to add it to QueueMetrics or 
alternatively you can add an application-metrics so you can add it there?

 Add metric for initial container launch time to FairScheduler
 -

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290286#comment-14290286
 ] 

Ray Chiang commented on YARN-2868:
--

I had it previously in FSQueueMetrics, then moved it to QueueMetrics based on 
Rohith's feedback, and then determined that I updating CapacityScheduler with 
all the matching queue stuff right now would be potentially in conflict with 
YARN-2986.  I could push the metric back to FSQueueMetrics.

Since this is a MutableRate, the metric shouldn't get clobbered with each app, 
but will get averaged in (unless I'm misunderstanding something).

I'm going to wait on further feedback before I do more editing.

 Add metric for initial container launch time to FairScheduler
 -

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3077) RM should create yarn.resourcemanager.zk-state-store.parent-path recursively

2015-01-23 Thread Chun Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chen updated YARN-3077:

Attachment: YARN-3077.2.patch

 RM should create yarn.resourcemanager.zk-state-store.parent-path recursively
 

 Key: YARN-3077
 URL: https://issues.apache.org/jira/browse/YARN-3077
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Chun Chen
 Attachments: YARN-3077.2.patch, YARN-3077.patch


 If multiple clusters share a zookeeper cluster, users might use 
 /rmstore/${yarn.resourcemanager.cluster-id} as the state store path. If user 
 specified a customer value which is not a top-level path for 
 ${yarn.resourcemanager.zk-state-store.parent-path}, yarn should create parent 
 path first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290250#comment-14290250
 ] 

Hadoop QA commented on YARN-2868:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694276/YARN-2868.007.patch
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6400//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6400//console

This message is automatically generated.

 Add metric for initial container launch time to FairScheduler
 -

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3088) LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns an error


[ 
https://issues.apache.org/jira/browse/YARN-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290246#comment-14290246
 ] 

Hadoop QA commented on YARN-3088:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694287/YARN-3088.v1.txt
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6401//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6401//console

This message is automatically generated.

 LinuxContainerExecutor.deleteAsUser can throw NPE if native executor returns 
 an error
 -

 Key: YARN-3088
 URL: https://issues.apache.org/jira/browse/YARN-3088
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Jason Lowe
Assignee: Eric Payne
 Attachments: YARN-3088.v1.txt


 If the native executor returns an error trying to delete a path as a 
 particular user when dir==null then the code can NPE trying to build a log 
 message for the error.  It blindly deferences dir in the log message despite 
 the code just above explicitly handling the cases when dir could be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping

[
https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290247#comment-14290247
]

Wangda Tan commented on YARN-3075:
--

{code}
I had put it inside node.labels != null condition earlier but this leads to
test case failures.
If you see the code in getNodeLabels, you would find that we get host.labels if
nodeId doesnt have specific labels associated with it.
I on the other hand am storing whatever is required right from the beginning.
So dont need to make this decision at the time of call to getLabelsToNodes
So its just a difference in approach. Doesn't lead to any functional issues.
Let me know your opinion on this.
{code}
The problem is not only functional issue. I think two parts of code need to be
consistent, mostly to avoid misunderstanding and could be easier debug.
- In NodeLabelsManager, when trying to get a labels on a Node, if node.label
not exist (null), return host.label. And If node and host has same label, we
should set node.label = null to make structure as simple as possible.
- In NodeLabel, I think we should have similar logic, if a node's label is as
same as host, we should only store host in NodeLabel.

Sounds good?
Thanks,

NodeLabelsManager implementation to retrieve label to node mapping
--

Key: YARN-3075
URL: https://issues.apache.org/jira/browse/YARN-3075
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Attachments: YARN-3075.001.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290289#comment-14290289
 ] 

Hadoop QA commented on YARN-3092:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694258/YARN-3092.1.patch
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6402//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6402//console

This message is automatically generated.

 Create common resource usage class to track labeled resource/capacity in 
 Capacity Scheduler
 ---

 Key: YARN-3092
 URL: https://issues.apache.org/jira/browse/YARN-3092
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3092.1.patch, YARN-3092.2.patch


 Since we have labels on nodes, so we need to track resource usage *by 
 labels*, includes
 - AM resource (to enforce max-am-resource-by-label after YARN-2637)
 - Used resource (includes AM resource usage)
 - Reserved resource
 - Pending resource
 - Headroom
 Benefits to have such a common class are:
 - Reuse lots of code in different places (Queue/App/User), better 
 maintainability and readability.
 - Can make fine-grained locking (e.g. accessing used resource in a queue 
 doesn't need lock a queue)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290291#comment-14290291
 ] 

Wangda Tan commented on YARN-2868:
--

Just checked code, is it good to put in ClusterMetrics? 
aMRegisterDelay/amLaunchDelay seems more related to such initial container 
allocation time.  And name of the metric could be App first container 
allocation delay.

Sounds good? [~rohithsharma].

 Add metric for initial container launch time to FairScheduler
 -

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3022) Expose Container resource information from NodeManager for monitoring

2015-01-23 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290316#comment-14290316
 ] 

Robert Kanter commented on YARN-3022:
-

LGTM, just one minor thing:
- In {{ContainerMetrics}}, can you create a {{public static final String}} for 
{{pMemUsage}} like you did for the others?



 Expose Container resource information from NodeManager for monitoring
 -

 Key: YARN-3022
 URL: https://issues.apache.org/jira/browse/YARN-3022
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3022.001.patch, YARN-3022.002.patch


 Along with exposing resource consumption of each container such as 
 (YARN-2141) its worth exposing the actual resource limit associated with them 
 to get better insight into YARN allocation and consumption



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3093) Support load command from admin [Helps to load big set of labels]

[
https://issues.apache.org/jira/browse/YARN-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290324#comment-14290324
]

Wangda Tan commented on YARN-3093:
--

That will be helpful, +1 for the propose too!.

I think we can make it compatible with syntax in YARN-3028, and in addition, do
you think is it possible we can add labels appeared in the conf file to
clusterNodeLabels automatically? Which we can make the config file simpler.

Support load command from admin [Helps to load big set of labels]
-

Key: YARN-3093
URL: https://issues.apache.org/jira/browse/YARN-3093
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Sunil G
Assignee: Sunil G

Proposing yarn rmadmin -load -nodelabels filename
nodelabels can be one such option here, and this can be generalized by
giving other options later.
Advantage of this command will be an easier configuration. Assume admin need
to load labels to more than 20+ nodes, current command is little difficult.
If these config can be preloaded in a file, and then can upload to RM. With
existing parsing and update logic, same can be achieved.
I am showing a simpler proposed config file.
{noformat}
rm1 $ cat node_label.conf
add [
label1,label2,label3,label4,label11,label12,label13,label14,abel21,label22,label23,label24
]
replace[
node1:port=label1,label2,label23,label24
node2:port=label4,abel11,label12,label13,label14,label21
node3:port=label2,label3,label4,label11,label12,label13,label14
node4:port=label14,label21,label22,label23,label24
node5:port=label14,label21,label22,label23,label24
node6:port=label4,label11,label12,label13,label14,label21,label22,label23,label24
]
{noformat}
A restriction on file size can be kept to avoid uploading very huge files.
Please share your opinion.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery

2015-01-23 Thread Jun Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290377#comment-14290377
 ] 

Jun Gong commented on YARN-3094:


[~rohithsharma] Thanks for your review. I will add a test case if needed.

{quote}
How many RUNNING applications are running in cluster?
{quote}
Just several hundreds apps running. The reason for slow recovery might be 
because a lot of exceptions when storing RMApps' data using 
RMApplicationHistoryWriter. We will make further investigation.

{quote}
What is the AM liveliness timeout configured in cluster?
{quote}
3 mins. Then we could find it earlier if AM crashes.

 reset timer for liveness monitors after RM recovery
 ---

 Key: YARN-3094
 URL: https://issues.apache.org/jira/browse/YARN-3094
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3094.patch


 When RM restarts, it will recover RMAppAttempts and registry them to 
 AMLivenessMonitor if they are not in final state. AM will time out in RM if 
 the recover process takes long time due to some reasons(e.g. too many apps). 
 In our system, we found the recover process took about 3 mins, and all AM 
 time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290411#comment-14290411
 ] 

Rohith commented on YARN-2868:
--

I had a thought that metric can be common to all schedulers. If there is any 
complexities now, can be added later. 
Even I had specific doubt that this metic is for application level but not for 
scheduler level. I was mentioned in previous comment 2nd point. I was in 
dilemma that where to place exactly!! Now I see clusterMetrics has already some 
metrics related to AM.
+1 for keeping in ClusterMetrics and for metric name.

 Add metric for initial container launch time to FairScheduler
 -

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3092) Create common resource usage class to track labeled resource/capacity in Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290371#comment-14290371
 ] 

Hadoop QA commented on YARN-3092:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694303/YARN-3092.2.patch
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6403//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6403//console

This message is automatically generated.

 Create common resource usage class to track labeled resource/capacity in 
 Capacity Scheduler
 ---

 Key: YARN-3092
 URL: https://issues.apache.org/jira/browse/YARN-3092
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-3092.1.patch, YARN-3092.2.patch


 Since we have labels on nodes, so we need to track resource usage *by 
 labels*, includes
 - AM resource (to enforce max-am-resource-by-label after YARN-2637)
 - Used resource (includes AM resource usage)
 - Reserved resource
 - Pending resource
 - Headroom
 Benefits to have such a common class are:
 - Reuse lots of code in different places (Queue/App/User), better 
 maintainability and readability.
 - Can make fine-grained locking (e.g. accessing used resource in a queue 
 doesn't need lock a queue)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery

2015-01-23 Thread Chun Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290395#comment-14290395
 ] 

Chun Chen commented on YARN-3094:
-

Since RM can't receive ping from AM util ApplicationMasterService starts, I 
think it is more accurate to reset time in AMLivelinessMonitor service after 
ApplicationMasterService starts. I suggest init AMLivelinessMonitor service 
after ApplicationMasterService in RMActiveServices#serviceInit.

 reset timer for liveness monitors after RM recovery
 ---

 Key: YARN-3094
 URL: https://issues.apache.org/jira/browse/YARN-3094
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3094.patch


 When RM restarts, it will recover RMAppAttempts and registry them to 
 AMLivenessMonitor if they are not in final state. AM will time out in RM if 
 the recover process takes long time due to some reasons(e.g. too many apps). 
 In our system, we found the recover process took about 3 mins, and all AM 
 time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping


[ 
https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290415#comment-14290415
 ] 

Varun Saxena commented on YARN-3075:


[~leftnoteasy],  regarding your comment below. 
bq. In NodeLabel, I think we should have similar logic, if a node's label is as 
same as host, we should only store host in NodeLabel.
Right now, when we {{getLabelsToNodes}} we simply query {{labelCollections}}. 
If we change like above, we will have to query {{nodeCollections}} as well to 
find out what all nodes are associated with the host stored.
Are you fine with doing that ? 

 NodeLabelsManager implementation to retrieve label to node mapping
 --

 Key: YARN-3075
 URL: https://issues.apache.org/jira/browse/YARN-3075
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3075.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature


[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288979#comment-14288979
 ] 

Hadoop QA commented on YARN-2800:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12693958/YARN-2800-20150122-1.patch
  against trunk revision 3aab354.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6395//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6395//console

This message is automatically generated.

 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, 
 YARN-2800-20150122-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature


[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289117#comment-14289117
 ] 

Tsuyoshi OZAWA commented on YARN-2800:
--

Committing this to trunk and branch-2. Thanks [~leftnoteasy] for your 
contribution and thanks [~vinodkv] for the review!

 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, 
 YARN-2800-20150122-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature


[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289120#comment-14289120
 ] 

Hudson commented on YARN-2800:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6917 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6917/])
YARN-2800. Remove MemoryNodeLabelsStore and add a way to enable/disable node 
labels feature. Contributed by Wangda Tan. (ozawa: rev 
24aa462673d392fed859f8088acf9679ae62a129)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/MemoryRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java


 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, 
 YARN-2800-20150122-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3094) reset timer for liveness monitors after RM recovery


[ 
https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289141#comment-14289141
 ] 

Rohith commented on YARN-3094:
--

Thanks [~hex108] for reporting the issue and for your contributions.
Patch looks to me good. Can you add tests for this?

And could  you give some general information like 
# How many RUNNING applications are running in cluster? 
# What is the AM liveliness timeout configured in cluster?


 reset timer for liveness monitors after RM recovery
 ---

 Key: YARN-3094
 URL: https://issues.apache.org/jira/browse/YARN-3094
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3094.patch


 When RM restarts, it will recover RMAppAttempts and registry them to 
 AMLivenessMonitor if they are not in final state. AM will time out in RM if 
 the recover process takes long time due to some reasons(e.g. too many apps). 
 In our system, we found the recover process took about 3 mins, and all AM 
 time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing


[ 
https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289109#comment-14289109
 ] 

Hudson commented on YARN-3082:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #816 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/816/])
YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse 
processing. Contributed by Anubhav Dhoot. (ozawa: rev 
3aab354e664a3ce09e0d638bf0c1e7d273d40579)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* hadoop-yarn-project/CHANGES.txt


 Non thread safe access to systemCredentials in NodeHeartbeatResponse 
 processing
 ---

 Key: YARN-3082
 URL: https://issues.apache.org/jira/browse/YARN-3082
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3082.001.patch, YARN-3082.002.patch


 When you use system credentials via feature added in YARN-2704, the proto 
 conversion code throws exception in converting ByteBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3085) Application summary should include the application type


[ 
https://issues.apache.org/jira/browse/YARN-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289119#comment-14289119
 ] 

Hadoop QA commented on YARN-3085:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694131/0001-YARN-3085.patch
  against trunk revision 3aab354.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6396//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6396//console

This message is automatically generated.

 Application summary should include the application type
 ---

 Key: YARN-3085
 URL: https://issues.apache.org/jira/browse/YARN-3085
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3085.patch


 Adding the application type to the RM application summary log makes it easier 
 to audit the number of applications from various app frameworks that are 
 running on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping


[ 
https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289123#comment-14289123
 ] 

Varun Saxena commented on YARN-3075:


bq. 4) In add(remove/replace)NodeToLabels, such null check is not necessary: if 
(label != null). It will be checked in check... methods in 
CommonsNodeLabelsManager.
That's true. Will remove the additional null check.

bq. 1) When op (add/remove/replace) is on a host nodeId.getPort() == 
WILDCARD_PORT, (of course you need update label-host), you only need update 
label-Nodes when check node.labels != null is true.
I had put it inside node.labels != null condition earlier but this leads to 
test case failures.
If you see the code in {{getNodeLabels}}, you would find that we get 
host.labels if nodeId doesnt have specific labels associated with it. 
I on the other hand am storing whatever is required right from the beginning. 
So dont need to make this decision at the time of call to {{getLabelsToNodes}}
So its just a difference in approach. Doesn't lead to any functional issues. 
Let me know your opinion on this.

bq. 3.3 When a label contains (nodeId.port = WILDCARD_PORT), you should add 
Nodes in the host if (node.labels == null). It is possible a. admin specify 
host1.label = x; b. nm1 on host1 activated. You should get nm1 when you inquire 
nodes of label=x. You may need to add a test to TestRMNodeLabelsManager. You 
can take a look at testNodeActiveDeactiveUpdate
Thanks for the input. Yes, activate and deactivate node needs to delete node 
from labelCollections as well. Will do so.
I will modify {{testNodeActiveDeactiveUpdate}} accordingly.

 NodeLabelsManager implementation to retrieve label to node mapping
 --

 Key: YARN-3075
 URL: https://issues.apache.org/jira/browse/YARN-3075
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3075.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping


[ 
https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289130#comment-14289130
 ] 

Varun Saxena commented on YARN-3075:


bq. 2) When op is on a node, as mentioned by Sunil, replace opertions not 
correct, it should be remove and then add.
bq. That is what I am doing. Removing and add. Sunil G meant that we can 
refactor replaceNodeForLabels and not reduplicate add already present in 
removeNodeFromLabels and addNodeForLabels function. Did you mean something else 
?
Typing mistake. I meant That is what I am doing. Removing and add. Sunil G 
meant that we can refactor replaceNodeForLabels and not reduplicate code 
already present in removeNodeFromLabels and addNodeForLabels function. Did you 
mean something else ?

 NodeLabelsManager implementation to retrieve label to node mapping
 --

 Key: YARN-3075
 URL: https://issues.apache.org/jira/browse/YARN-3075
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3075.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3011) NM dies because of the failure of resource localization


 [ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3011:
---
Attachment: YARN-3011.002.patch

 NM dies because of the failure of resource localization
 ---

 Key: YARN-3011
 URL: https://issues.apache.org/jira/browse/YARN-3011
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao
Assignee: Varun Saxena
 Attachments: YARN-3011.001.patch, YARN-3011.002.patch


 NM dies because of IllegalArgumentException when localize resource.
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
  1416997035456, FILE, null }
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
  1419831474153, FILE, null }
 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Error in dispatcher thread
 java.lang.IllegalArgumentException: Can not create a Path from an empty string
 at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
 at org.apache.hadoop.fs.Path.init(Path.java:135)
 at org.apache.hadoop.fs.Path.init(Path.java:94)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
   
 at java.lang.Thread.run(Thread.java:745)
 2014-12-29 13:43:58,701 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user hadoop
 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Exiting, bbye..
 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
 connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization


[ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290434#comment-14290434
 ] 

Varun Saxena commented on YARN-3011:


bq. IIUC, if yarn.dispatcher.exit-on-error is set to false, NM will not crash 
in this case?
Yes, you are correct.

 NM dies because of the failure of resource localization
 ---

 Key: YARN-3011
 URL: https://issues.apache.org/jira/browse/YARN-3011
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao
Assignee: Varun Saxena
 Attachments: YARN-3011.001.patch


 NM dies because of IllegalArgumentException when localize resource.
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
  1416997035456, FILE, null }
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
  1419831474153, FILE, null }
 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Error in dispatcher thread
 java.lang.IllegalArgumentException: Can not create a Path from an empty string
 at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
 at org.apache.hadoop.fs.Path.init(Path.java:135)
 at org.apache.hadoop.fs.Path.init(Path.java:94)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
   
 at java.lang.Thread.run(Thread.java:745)
 2014-12-29 13:43:58,701 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user hadoop
 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Exiting, bbye..
 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
 connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3011) NM dies because of the failure of resource localization


 [ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3011:
---
Attachment: YARN-3011.002.patch

 NM dies because of the failure of resource localization
 ---

 Key: YARN-3011
 URL: https://issues.apache.org/jira/browse/YARN-3011
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao
Assignee: Varun Saxena
 Attachments: YARN-3011.001.patch, YARN-3011.002.patch


 NM dies because of IllegalArgumentException when localize resource.
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
  1416997035456, FILE, null }
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
  1419831474153, FILE, null }
 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Error in dispatcher thread
 java.lang.IllegalArgumentException: Can not create a Path from an empty string
 at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
 at org.apache.hadoop.fs.Path.init(Path.java:135)
 at org.apache.hadoop.fs.Path.init(Path.java:94)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
   
 at java.lang.Thread.run(Thread.java:745)
 2014-12-29 13:43:58,701 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user hadoop
 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Exiting, bbye..
 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
 connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3011) NM dies because of the failure of resource localization


 [ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3011:
---
Attachment: (was: YARN-3011.002.patch)

 NM dies because of the failure of resource localization
 ---

 Key: YARN-3011
 URL: https://issues.apache.org/jira/browse/YARN-3011
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao
Assignee: Varun Saxena
 Attachments: YARN-3011.001.patch, YARN-3011.002.patch


 NM dies because of IllegalArgumentException when localize resource.
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
  1416997035456, FILE, null }
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
  1419831474153, FILE, null }
 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Error in dispatcher thread
 java.lang.IllegalArgumentException: Can not create a Path from an empty string
 at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
 at org.apache.hadoop.fs.Path.init(Path.java:135)
 at org.apache.hadoop.fs.Path.init(Path.java:94)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
   
 at java.lang.Thread.run(Thread.java:745)
 2014-12-29 13:43:58,701 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user hadoop
 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Exiting, bbye..
 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
 connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.


[ 
https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290465#comment-14290465
 ] 

Hadoop QA commented on YARN-3079:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694331/YARN-3079.002.patch
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6406//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6406//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6406//console

This message is automatically generated.

 Scheduler should also update maximumAllocation when updateNodeResource.
 ---

 Key: YARN-3079
 URL: https://issues.apache.org/jira/browse/YARN-3079
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3079.000.patch, YARN-3079.001.patch, 
 YARN-3079.002.patch


 Scheduler should also update maximumAllocation when updateNodeResource. 
 Otherwise even the node resource is changed by 
 AdminService#updateNodeResource, maximumAllocation won't be changed.
 Also RMNodeReconnectEvent called from 
 ResourceTrackerService#registerNodeManager will also trigger 
 AbstractYarnScheduler#updateNodeResource being called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization


[ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290466#comment-14290466
 ] 

Hadoop QA commented on YARN-3011:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694335/YARN-3011.002.patch
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6407//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6407//console

This message is automatically generated.

 NM dies because of the failure of resource localization
 ---

 Key: YARN-3011
 URL: https://issues.apache.org/jira/browse/YARN-3011
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao
Assignee: Varun Saxena
 Attachments: YARN-3011.001.patch, YARN-3011.002.patch


 NM dies because of IllegalArgumentException when localize resource.
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
  1416997035456, FILE, null }
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
  1419831474153, FILE, null }
 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Error in dispatcher thread
 java.lang.IllegalArgumentException: Can not create a Path from an empty string
 at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
 at org.apache.hadoop.fs.Path.init(Path.java:135)
 at org.apache.hadoop.fs.Path.init(Path.java:94)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
   
 at java.lang.Thread.run(Thread.java:745)
 2014-12-29 13:43:58,701 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user hadoop
 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Exiting, bbye..
 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
 connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.


 [ 
https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3079:

Attachment: YARN-3079.002.patch

 Scheduler should also update maximumAllocation when updateNodeResource.
 ---

 Key: YARN-3079
 URL: https://issues.apache.org/jira/browse/YARN-3079
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3079.000.patch, YARN-3079.001.patch, 
 YARN-3079.002.patch


 Scheduler should also update maximumAllocation when updateNodeResource. 
 Otherwise even the node resource is changed by 
 AdminService#updateNodeResource, maximumAllocation won't be changed.
 Also RMNodeReconnectEvent called from 
 ResourceTrackerService#registerNodeManager will also trigger 
 AbstractYarnScheduler#updateNodeResource being called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2868) Add metric for initial container launch time to FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290431#comment-14290431
 ] 

Rohith commented on YARN-2868:
--

bq. All clustermetrics that are related to a queue should be moved to per queue 
metrics. I will open other jiras for moving those
IIUC, be inform that this would breaks compatibility. ClusterMetrics are 
exposed to users. It would be better to keep current metrics as it is and only 
work on new metrics addition.

 Add metric for initial container launch time to FairScheduler
 -

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch, 
 YARN-2868.006.patch, YARN-2868.007.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.


[ 
https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290444#comment-14290444
 ] 

zhihai xu commented on YARN-3079:
-

thanks [~leftnoteasy] and [~adhoot]'s review,

addressed the comment from [~adhoot] in the new patch YARN-3079.002.patch.

about the comment from [~leftnoteasy],
bq. 1)  Suggest to change signature of updateMaximumAllocation(SchedulerNode, 
bool) to updateMaximumAllocation(Resource nodeResource, bool), since we only 
uses nodeResource here.
This is discussable. I prefer to keep the current signature because the current 
signature is more flexible and more meaningful for the other parameter(added 
node or removed node). Two nodes can have same nodeResource and you can access 
more information from SchedulerNode.
bq. 2) Change resource for a NM is equivalent to 
{{updateMaximumAllocation(oldNodeResource, false)}} and 
{{updateMaximumAllocation(newNoderesource, true)}}. We can avoid some 
duplicated logic.
I think it is not completely equivalent . because when you call 
{{updateMaximumAllocation(oldNodeResource, false)}}, you supposed the node is 
already removed from the HashMap nodes based on both the implementation of 
updateMaximumAllocation and caller of updateMaximumAllocation. But in the 
context updateNodeResource, the node whose resource to be changed is still in 
the HashMap nodes.
bq. 3) Suggest rename updateMaximumAllocation(void) to 
refreshMaximumAllocation() or other name reflects the behavior: scan all 
cluster nodes and get maximum allocation.
good suggestion. refreshMaximumAllocation is very good name. addressed this 
comment in the new patch YARN-3079.002.patch.


 Scheduler should also update maximumAllocation when updateNodeResource.
 ---

 Key: YARN-3079
 URL: https://issues.apache.org/jira/browse/YARN-3079
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3079.000.patch, YARN-3079.001.patch, 
 YARN-3079.002.patch


 Scheduler should also update maximumAllocation when updateNodeResource. 
 Otherwise even the node resource is changed by 
 AdminService#updateNodeResource, maximumAllocation won't be changed.
 Also RMNodeReconnectEvent called from 
 ResourceTrackerService#registerNodeManager will also trigger 
 AbstractYarnScheduler#updateNodeResource being called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.


 [ 
https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3079:

Attachment: YARN-3079.002.patch

 Scheduler should also update maximumAllocation when updateNodeResource.
 ---

 Key: YARN-3079
 URL: https://issues.apache.org/jira/browse/YARN-3079
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3079.000.patch, YARN-3079.001.patch, 
 YARN-3079.002.patch


 Scheduler should also update maximumAllocation when updateNodeResource. 
 Otherwise even the node resource is changed by 
 AdminService#updateNodeResource, maximumAllocation won't be changed.
 Also RMNodeReconnectEvent called from 
 ResourceTrackerService#registerNodeManager will also trigger 
 AbstractYarnScheduler#updateNodeResource being called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3079) Scheduler should also update maximumAllocation when updateNodeResource.


 [ 
https://issues.apache.org/jira/browse/YARN-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3079:

Attachment: (was: YARN-3079.002.patch)

 Scheduler should also update maximumAllocation when updateNodeResource.
 ---

 Key: YARN-3079
 URL: https://issues.apache.org/jira/browse/YARN-3079
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-3079.000.patch, YARN-3079.001.patch, 
 YARN-3079.002.patch


 Scheduler should also update maximumAllocation when updateNodeResource. 
 Otherwise even the node resource is changed by 
 AdminService#updateNodeResource, maximumAllocation won't be changed.
 Also RMNodeReconnectEvent called from 
 ResourceTrackerService#registerNodeManager will also trigger 
 AbstractYarnScheduler#updateNodeResource being called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster


[ 
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290490#comment-14290490
 ] 

Tsuyoshi OZAWA commented on YARN-3086:
--

[~rmetzger] Please check whether your tests fail without your patch. If they 
fails without your patch, it can be environment-depend problem. In that case, 
you can throw a patch regardless of the failures.

 Make NodeManager memory configurable in MiniYARNCluster
 ---

 Key: YARN-3086
 URL: https://issues.apache.org/jira/browse/YARN-3086
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Robert Metzger
Priority: Minor
 Attachments: YARN-3086.patch


 Apache Flink has a build-in YARN client to deploy it to YARN clusters.
 Recently, we added more tests for the client, using the MiniYARNCluster.
 One of the tests is requesting more containers than available. This test 
 works well on machines with enough memory, but on travis-ci (our test 
 environment), the available main memory is limited to 3 GB. 
 Therefore, I want to set custom amount of memory for each NodeManager.
 Right now, the NodeManager memory is hardcoded to 4GB.
 As discussed on the yarn-dev list, I'm going to create a patch for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-23 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290493#comment-14290493
 ] 

Yongjun Zhang commented on YARN-3021:
-

I reran the failed tests locally, 

The following tests
{quote}
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStatorg.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
org.apache.hadoop.mapred.TestJobConf
{quote}
are successful

The following test failed:
{quote}
org.apache.hadoop.conf.TestJobConf.testNegativeValueForTaskVmem
TestJobConf.testNegativeValueForTaskVmem:111 expected:1024 but was:-1
{quote}
and it was reported as MAPREDUCE-6223.

TestLargeSort failed in 
https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2033/


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1456) IntelliJ IDEA gets dependencies wrong for hadoop-yarn-server-resourcemanager

2015-01-23 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289052#comment-14289052
 ] 

Steve Loughran commented on YARN-1456:
--

marking as a duplicate of YARN-888; I've not seen it for a while

 IntelliJ IDEA gets dependencies wrong for  hadoop-yarn-server-resourcemanager
 -

 Key: YARN-1456
 URL: https://issues.apache.org/jira/browse/YARN-1456
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
 Environment: IntelliJ IDEA 12.x  13.x beta
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Attachments: YARN-1456-001.patch


 When IntelliJ IDEA imports the hadoop POMs into the IDE, somehow it fails to 
 pick up all the transitive dependencies of the yarn-client, and so can't 
 resolve commons logging, com.google.* classes and the like.
 While this is probably an IDEA bug, it does stop you building Hadoop from 
 inside the IDE, making debugging significantly harder



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3093) Support load command from admin [Helps to load big set of labels]

[
https://issues.apache.org/jira/browse/YARN-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289053#comment-14289053
]

Rohith commented on YARN-3093:
--

+1 for the propose. This is much useful for very large cluster.

Support load command from admin [Helps to load big set of labels]
-

Key: YARN-3093
URL: https://issues.apache.org/jira/browse/YARN-3093
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Sunil G
Assignee: Sunil G

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping

[
https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289062#comment-14289062
]

Varun Saxena commented on YARN-3075:

[~leftnoteasy], thanks for the review. Kindly find my replies below :

bq. 2) When op is on a node, as mentioned by Sunil, replace opertions not
correct, it should be remove and then add.
That is what I am doing. Removing and add. [~sunilg] meant that we can refactor
replaceNodeForLabels and not reduplicate add already present in
removeNodeFromLabels and addNodeForLabels function. Did you mean something else
?

bq. 3.1 Two loop seems duplicated, you can set labels =
labelCollections.entrySet when (labels == null or empty).
That's a good suggestion. Will make the change

bq. 3.2 When labels == null or empty, it will returns nodes on all labels. You
need add a javadocs to brief this behavior and you need remove empty label from
labelCollection like what we did in getClusterNodeLabels. The empty label
exists because we need track non-labeled nodes in scheduler side but it
shouldn't be seen by user.
Didn't know about that. Will remove the empty label.

NodeLabelsManager implementation to retrieve label to node mapping
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3094) reset timer for liveness monitors after RM recovery

2015-01-23 Thread Jun Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3094:
---
Attachment: YARN-3094.patch

 reset timer for liveness monitors after RM recovery
 ---

 Key: YARN-3094
 URL: https://issues.apache.org/jira/browse/YARN-3094
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: YARN-3094.patch


 When RM restarts, it will recover RMAppAttempts and registry them to 
 AMLivenessMonitor if they are not in final state. AM will time out in RM if 
 the recover process takes long time due to some reasons(e.g. too many apps). 
 In our system, we found the recover process took about 3 mins, and all AM 
 time out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

[
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289097#comment-14289097
]

Tsuyoshi OZAWA commented on YARN-2800:
--

I also confirmed that the test failure of TestRMWebServicesAppsModification is
not related to the patch. It passes locally.

Remove MemoryNodeLabelsStore and add a way to enable/disable node labels
feature

Key: YARN-2800
URL: https://issues.apache.org/jira/browse/YARN-2800
Project: Hadoop YARN
Issue Type: Sub-task
Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch,
YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch,
YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch,
YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch,
YARN-2800-20150122-1.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3085) Application summary should include the application type


 [ 
https://issues.apache.org/jira/browse/YARN-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3085:
-
Attachment: 0001-YARN-3085.patch

 Application summary should include the application type
 ---

 Key: YARN-3085
 URL: https://issues.apache.org/jira/browse/YARN-3085
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Rohith
 Attachments: 0001-YARN-3085.patch


 Adding the application type to the RM application summary log makes it easier 
 to audit the number of applications from various app frameworks that are 
 running on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

[
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289093#comment-14289093
]

Tsuyoshi OZAWA commented on YARN-2800:
--

Thanks for your updating. I rechecked the code. RMNodeLabelsManager,
FileSystemNodeLabelsStore, RMAdminCLI can access to the variable
nodeLabelsEnabled, but I agree with you that we don't need to make
nodeLabelsEnabled volatile since there are no problem in the code path.

I'll update following comments to follow javadoc format. After that I'll commit
it.

{code}
+ /*
+ * Following are options for node labels
{code}

{code}
+ /*
+ * Error messages
+ */
{code}

Remove MemoryNodeLabelsStore and add a way to enable/disable node labels
feature

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3081) Potential indefinite wait in ContainerManagementProtocolProxy#addProxyToCache()


[ 
https://issues.apache.org/jira/browse/YARN-3081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289103#comment-14289103
 ] 

Tsuyoshi OZAWA commented on YARN-3081:
--

[~ted_yu], Thanks for the reporting. I checked the code path. Current code 
looks correct. If the tryCloseProxy() succeeds, the sleeping threads will be 
waken up and try to register the proxy instance as cache entry. If wait() have 
the timeout value, the sleeping threads are waken up before cmProxy.size() is 
updated. It looks consumption of CPU resource since there are no change between 
before and after the sleep. What do you think? Please let me know if I have 
some missing points.


 Potential indefinite wait in 
 ContainerManagementProtocolProxy#addProxyToCache()
 ---

 Key: YARN-3081
 URL: https://issues.apache.org/jira/browse/YARN-3081
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: yarn-3081-001.patch


 {code}
   if (!removedProxy) {
 // all of the proxies are currently in use and already scheduled
 // for removal, so we need to wait until at least one of them closes
 try {
   this.wait();
 {code}
 The above code can wait for a condition that has already been satisfied, 
 leading to indefinite wait.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing


[ 
https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289073#comment-14289073
 ] 

Hudson commented on YARN-3082:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #82 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/82/])
YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse 
processing. Contributed by Anubhav Dhoot. (ozawa: rev 
3aab354e664a3ce09e0d638bf0c1e7d273d40579)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 Non thread safe access to systemCredentials in NodeHeartbeatResponse 
 processing
 ---

 Key: YARN-3082
 URL: https://issues.apache.org/jira/browse/YARN-3082
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3082.001.patch, YARN-3082.002.patch


 When you use system credentials via feature added in YARN-2704, the proto 
 conversion code throws exception in converting ByteBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3094) reset timer for liveness monitors after RM recovery

2015-01-23 Thread Jun Gong (JIRA)

Jun Gong created YARN-3094:
--

 Summary: reset timer for liveness monitors after RM recovery
 Key: YARN-3094
 URL: https://issues.apache.org/jira/browse/YARN-3094
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jun Gong
Assignee: Jun Gong


When RM restarts, it will recover RMAppAttempts and registry them to 
AMLivenessMonitor if they are not in final state. AM will time out in RM if the 
recover process takes long time due to some reasons(e.g. too many apps). 

In our system, we found the recover process took about 3 mins, and all AM time 
out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-914) Support graceful decommission of nodemanager

2015-01-23 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289286#comment-14289286
]

Jason Lowe commented on YARN-914:
-

bq. The first step I was thinking to keep NM running in a low resource mode
after graceful decommissioned

I think it could be useful to leave the NM process up after the graceful
decommission completes. That allows automated decommissioning tools to know
the process completed by querying the NM directly. If the NM exits then the
tool may have difficulty distinguishing between the NM crashing just before
decommisioning completed vs. successful completion. The RM will be tracking
this state as well, so it may not be critical to do it one way or the other if
the tool is querying the RM rather than the NM directly.

bq. However, I am not sure if they can handle state migration to new node ahead
of predictable node lost here, or be stateless more or less make more sense
here?

I agree with Ming that it would be nice if the graceful decommission process
could give the AMs a heads up about what's going on. The simplest way to
accomplish that is to leverage the already existing preemption framework to
tell the AM that YARN is about to take the resources away. The
StrictPreemptionContract portion of the PreemptionMessage can be used to list
exact resources that YARN will be reclaiming and give the AM a chance to react
to that before the containers are reclaimed. It's then up to the AM if it
wants to do anything special or just let the containers get killed after a
timeout.

bq. These notification may still be necessary, so AM won't add these nodes into
blacklist if container get killed afterwards. Thoughts?

I thought we could leverage the updated nodes list of the AllocateResponse to
let AMs know when nodes are entering the decommissioning state or at least when
the decommission state completes (and containers are killed). Although if the
AM adds the node to the blacklist, that's not such a bad thing either since the
RM should never allocate new containers on a decommissioning node anyway.

Support graceful decommission of nodemanager

Key: YARN-914
URL: https://issues.apache.org/jira/browse/YARN-914
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Luke Lu
Assignee: Junping Du

When NMs are decommissioned for non-fault reasons (capacity change etc.),
it's desirable to minimize the impact to running applications.
Currently if a NM is decommissioned, all running containers on the NM need to
be rescheduled on other NMs. Further more, for finished map tasks, if their
map output are not fetched by the reducers of the job, these map tasks will
need to be rerun as well.
We propose to introduce a mechanism to optionally gracefully decommission a
node manager.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster

2015-01-23 Thread Robert Metzger (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated YARN-3086:
-
Attachment: YARN-3086.patch

Wow .. This is a special moment. The first patch I'm submitting to Hadoop ;)

Sadly, I'm was not able to run any the tests because the tests in trunk don't 
seem to pass. Lets hope the CI tools here are able to verify my patch.

But out of curiosity: Is it common that your trunk is not building?
Am I supposed to develop against a version specific branch?

 Make NodeManager memory configurable in MiniYARNCluster
 ---

 Key: YARN-3086
 URL: https://issues.apache.org/jira/browse/YARN-3086
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Robert Metzger
Priority: Minor
 Attachments: YARN-3086.patch


 Apache Flink has a build-in YARN client to deploy it to YARN clusters.
 Recently, we added more tests for the client, using the MiniYARNCluster.
 One of the tests is requesting more containers than available. This test 
 works well on machines with enough memory, but on travis-ci (our test 
 environment), the available main memory is limited to 3 GB. 
 Therefore, I want to set custom amount of memory for each NodeManager.
 Right now, the NodeManager memory is hardcoded to 4GB.
 As discussed on the yarn-dev list, I'm going to create a patch for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing


[ 
https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289281#comment-14289281
 ] 

Hudson commented on YARN-3082:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #79 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/79/])
YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse 
processing. Contributed by Anubhav Dhoot. (ozawa: rev 
3aab354e664a3ce09e0d638bf0c1e7d273d40579)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 Non thread safe access to systemCredentials in NodeHeartbeatResponse 
 processing
 ---

 Key: YARN-3082
 URL: https://issues.apache.org/jira/browse/YARN-3082
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3082.001.patch, YARN-3082.002.patch


 When you use system credentials via feature added in YARN-2704, the proto 
 conversion code throws exception in converting ByteBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster

[
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289349#comment-14289349
]

Tsuyoshi OZAWA commented on YARN-3086:
--

{code}
I'm was not able to run any the tests because the tests in trunk don't seem
to pass
{code}

The intermittent test failure can be observed currently for some reasons - port
conflict, shortage of resource, and timing issues. We should fix them, but the
problems are still there. I recommend you to run test under the related
directory - in this case, I recommend you to run tests under
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
instead of root directory.

{code}
Am I supposed to develop against a version specific branch?
{code}

The develop branch is trunk, so I recommend you to develop on trunk except the
problem is branch-specific problem.

Make NodeManager memory configurable in MiniYARNCluster
---

Key: YARN-3086
URL: https://issues.apache.org/jira/browse/YARN-3086
Project: Hadoop YARN
Issue Type: Improvement
Components: test
Reporter: Robert Metzger
Priority: Minor
Attachments: YARN-3086.patch

Apache Flink has a build-in YARN client to deploy it to YARN clusters.
Recently, we added more tests for the client, using the MiniYARNCluster.
One of the tests is requesting more containers than available. This test
works well on machines with enough memory, but on travis-ci (our test
environment), the available main memory is limited to 3 GB.
Therefore, I want to set custom amount of memory for each NodeManager.
Right now, the NodeManager memory is hardcoded to 4GB.
As discussed on the yarn-dev list, I'm going to create a patch for this issue.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-23 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated YARN-3021:

Attachment: YARN-3021.001.patch

 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster


[ 
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289356#comment-14289356
 ] 

Tsuyoshi OZAWA commented on YARN-3086:
--

{quote}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
Oops, maybe 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 is better.
{quote}

This lines are typo. Please ignore them.

 Make NodeManager memory configurable in MiniYARNCluster
 ---

 Key: YARN-3086
 URL: https://issues.apache.org/jira/browse/YARN-3086
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Robert Metzger
Priority: Minor
 Attachments: YARN-3086.patch


 Apache Flink has a build-in YARN client to deploy it to YARN clusters.
 Recently, we added more tests for the client, using the MiniYARNCluster.
 One of the tests is requesting more containers than available. This test 
 works well on machines with enough memory, but on travis-ci (our test 
 environment), the available main memory is limited to 3 GB. 
 Therefore, I want to set custom amount of memory for each NodeManager.
 Right now, the NodeManager memory is hardcoded to 4GB.
 As discussed on the yarn-dev list, I'm going to create a patch for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster

[
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289355#comment-14289355
]

Tsuyoshi OZAWA commented on YARN-3086:
--

{code}
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager

{code}

Oops, maybe
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
is better.

{code}
cd hadoop # change directory to
mvn clean install -DskipTests # compile and install related jars into local
repository
cd
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager

mvn test # launch test for hadoop-yarn-server-resourcemanager
{code}

It can take 1 hour or more.If you suffer the waiting time, you can skip to
launch the test since I'll help you to submit Jenkins CI with your patch.

Make NodeManager memory configurable in MiniYARNCluster
---

Key: YARN-3086
URL: https://issues.apache.org/jira/browse/YARN-3086
Project: Hadoop YARN
Issue Type: Improvement
Components: test
Reporter: Robert Metzger
Priority: Minor
Attachments: YARN-3086.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster

2015-01-23 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289360#comment-14289360
 ] 

Robert Metzger commented on YARN-3086:
--

Thank you for all the help!
I've updated the code...  and I'm now trying to execute the tests with your 
instructions.
If that works out, I'll upload an updated version of the patch.

 Make NodeManager memory configurable in MiniYARNCluster
 ---

 Key: YARN-3086
 URL: https://issues.apache.org/jira/browse/YARN-3086
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Robert Metzger
Priority: Minor
 Attachments: YARN-3086.patch


 Apache Flink has a build-in YARN client to deploy it to YARN clusters.
 Recently, we added more tests for the client, using the MiniYARNCluster.
 One of the tests is requesting more containers than available. This test 
 works well on machines with enough memory, but on travis-ci (our test 
 environment), the available main memory is limited to 3 GB. 
 Therefore, I want to set custom amount of memory for each NodeManager.
 Right now, the NodeManager memory is hardcoded to 4GB.
 As discussed on the yarn-dev list, I'm going to create a patch for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3086) Make NodeManager memory configurable in MiniYARNCluster


[ 
https://issues.apache.org/jira/browse/YARN-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289338#comment-14289338
 ] 

Tsuyoshi OZAWA commented on YARN-3086:
--

[~rmetzger] Great! At first, I'd like to comment about your patch:

How about making the default value 4 * 1024? As a result, we can remove if 
statement and simplify the code path.
{code}
DEFAULT_YARN_MINICLUSTER_NM_PMEM_MB = -1;
{code}

Could you update the point?

 Make NodeManager memory configurable in MiniYARNCluster
 ---

 Key: YARN-3086
 URL: https://issues.apache.org/jira/browse/YARN-3086
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Robert Metzger
Priority: Minor
 Attachments: YARN-3086.patch


 Apache Flink has a build-in YARN client to deploy it to YARN clusters.
 Recently, we added more tests for the client, using the MiniYARNCluster.
 One of the tests is requesting more containers than available. This test 
 works well on machines with enough memory, but on travis-ci (our test 
 environment), the available main memory is limited to 3 GB. 
 Therefore, I want to set custom amount of memory for each NodeManager.
 Right now, the NodeManager memory is hardcoded to 4GB.
 As discussed on the yarn-dev list, I'm going to create a patch for this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3011) NM dies because of the failure of resource localization


[ 
https://issues.apache.org/jira/browse/YARN-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289363#comment-14289363
 ] 

Varun Saxena commented on YARN-3011:


Someone, kindly review this one

 NM dies because of the failure of resource localization
 ---

 Key: YARN-3011
 URL: https://issues.apache.org/jira/browse/YARN-3011
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.1
Reporter: Wang Hao
Assignee: Varun Saxena
 Attachments: YARN-3011.001.patch


 NM dies because of IllegalArgumentException when localize resource.
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hadoop/share/lib/oozie/json-simple-1.1.jar,
  1416997035456, FILE, null }
 2014-12-29 13:43:58,699 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Downloading public rsrc:{ 
 hdfs://hadoop002.dx.momo.com:8020/user/hive/src/final_test_ooize/test_ooize_job1.sql/,
  1419831474153, FILE, null }
 2014-12-29 13:43:58,701 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Error in dispatcher thread
 java.lang.IllegalArgumentException: Can not create a Path from an empty string
 at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
 at org.apache.hadoop.fs.Path.init(Path.java:135)
 at org.apache.hadoop.fs.Path.init(Path.java:94)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:758)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:672)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:614)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)  
   
 at java.lang.Thread.run(Thread.java:745)
 2014-12-29 13:43:58,701 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user hadoop
 2014-12-29 13:43:58,702 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Exiting, bbye..
 2014-12-29 13:43:58,704 INFO org.apache.hadoop.mapred.ShuffleHandler: Setting 
 connection close header...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature


[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289381#comment-14289381
 ] 

Hudson commented on YARN-2800:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2033 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2033/])
YARN-2800. Remove MemoryNodeLabelsStore and add a way to enable/disable node 
labels feature. Contributed by Wangda Tan. (ozawa: rev 
24aa462673d392fed859f8088acf9679ae62a129)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/MemoryRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, 
 YARN-2800-20150122-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing


[ 
https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289379#comment-14289379
 ] 

Hudson commented on YARN-3082:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2033 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2033/])
YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse 
processing. Contributed by Anubhav Dhoot. (ozawa: rev 
3aab354e664a3ce09e0d638bf0c1e7d273d40579)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java


 Non thread safe access to systemCredentials in NodeHeartbeatResponse 
 processing
 ---

 Key: YARN-3082
 URL: https://issues.apache.org/jira/browse/YARN-3082
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3082.001.patch, YARN-3082.002.patch


 When you use system credentials via feature added in YARN-2704, the proto 
 conversion code throws exception in converting ByteBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping

2015-01-23 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289432#comment-14289432
 ] 

Sunil G commented on YARN-3075:
---

Hi Varun

{code}
+  removeNodeFromLabels(nodeId, labels);
   host.labels.removeAll(labels);
+  for (EntryNodeId, Node nmEntry : host.nms.entrySet()) {
+Node node = nmEntry.getValue();
 if (node.labels != null) {
   node.labels.removeAll(labels);
 }
+removeNodeFromLabels(nmEntry.getKey(), labels);
   }
{code}

I think first call to removeNodeFromLabels can be removed. Only loop should be 
enough.




 NodeLabelsManager implementation to retrieve label to node mapping
 --

 Key: YARN-3075
 URL: https://issues.apache.org/jira/browse/YARN-3075
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3075.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3021) YARN's delegation-token handling disallows certain trust setups to operate properly

2015-01-23 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289439#comment-14289439
 ] 

Yongjun Zhang commented on YARN-3021:
-

Hi [~qwertymaniac], [~vinodkv], [~adhoot],

Thanks for the earlier discussion and input. I uploaded patch rev 001 by 
introducing a new job configuration property 
mapreduce.job.skip.rm.token.renewal, thus passing 
-Dmapreduce.job.skip.rm.token.renewal=true to distcp (to instruct Resource 
Manager to skip token renewal) would solve the problem.

I did test in the env Harsh helped to set up, thanks Harsh.

Would you please help taking a look at the patch?

Thanks.


 YARN's delegation-token handling disallows certain trust setups to operate 
 properly
 ---

 Key: YARN-3021
 URL: https://issues.apache.org/jira/browse/YARN-3021
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Harsh J
 Attachments: YARN-3021.001.patch, YARN-3021.patch


 Consider this scenario of 3 realms: A, B and COMMON, where A trusts COMMON, 
 and B trusts COMMON (one way trusts both), and both A and B run HDFS + YARN 
 clusters.
 Now if one logs in with a COMMON credential, and runs a job on A's YARN that 
 needs to access B's HDFS (such as a DistCp), the operation fails in the RM, 
 as it attempts a renewDelegationToken(…) synchronously during application 
 submission (to validate the managed token before it adds it to a scheduler 
 for automatic renewal). The call obviously fails cause B realm will not trust 
 A's credentials (here, the RM's principal is the renewer).
 In the 1.x JobTracker the same call is present, but it is done asynchronously 
 and once the renewal attempt failed we simply ceased to schedule any further 
 attempts of renewals, rather than fail the job immediately.
 We should change the logic such that we attempt the renewal but go easy on 
 the failure and skip the scheduling alone, rather than bubble back an error 
 to the client, failing the app submission. This way the old behaviour is 
 retained.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping


[ 
https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289435#comment-14289435
 ] 

Varun Saxena commented on YARN-3075:


bq. I think first call to removeNodeFromLabels can be removed. Only loop should 
be enough.
host.nms wont have entry of host:0 so merely the loop wont be enough

 NodeLabelsManager implementation to retrieve label to node mapping
 --

 Key: YARN-3075
 URL: https://issues.apache.org/jira/browse/YARN-3075
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3075.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3075) NodeLabelsManager implementation to retrieve label to node mapping

2015-01-23 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289465#comment-14289465
 ] 

Sunil G commented on YARN-3075:
---

Thank you [~varun_saxena] for clarifying.

So as we discussed, you are saving hosts including port 0. Hence I got 
confused. 
If possible can try to keep the same storage structure, and it will be easier 
later for management.

 NodeLabelsManager implementation to retrieve label to node mapping
 --

 Key: YARN-3075
 URL: https://issues.apache.org/jira/browse/YARN-3075
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3075.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing


[ 
https://issues.apache.org/jira/browse/YARN-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289344#comment-14289344
 ] 

Hudson commented on YARN-3082:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #83 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/83/])
YARN-3082. Non thread safe access to systemCredentials in NodeHeartbeatResponse 
processing. Contributed by Anubhav Dhoot. (ozawa: rev 
3aab354e664a3ce09e0d638bf0c1e7d273d40579)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java


 Non thread safe access to systemCredentials in NodeHeartbeatResponse 
 processing
 ---

 Key: YARN-3082
 URL: https://issues.apache.org/jira/browse/YARN-3082
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-3082.001.patch, YARN-3082.002.patch


 When you use system credentials via feature added in YARN-2704, the proto 
 conversion code throws exception in converting ByteBuffer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature


[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289346#comment-14289346
 ] 

Hudson commented on YARN-2800:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #83 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/83/])
YARN-2800. Remove MemoryNodeLabelsStore and add a way to enable/disable node 
labels feature. Contributed by Wangda Tan. (ozawa: rev 
24aa462673d392fed859f8088acf9679ae62a129)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/MemoryRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestCommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/TestRMNodeLabelsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacitySchedulerNodeLabelUpdate.java
* hadoop-yarn-project/CHANGES.txt


 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.7.0

 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch, 
 YARN-2800-20141205-1.patch, YARN-2800-20141205-1.patch, 
 YARN-2800-20150122-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3082) Non thread safe access to systemCredentials in NodeHeartbeatResponse processing