date:20140814


 [ 
https://issues.apache.org/jira/browse/YARN-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2173.
---

Resolution: Implemented

 Enabling HTTPS for the reader REST APIs of TimelineServer
 -

 Key: YARN-2173
 URL: https://issues.apache.org/jira/browse/YARN-2173
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2173) Enabling HTTPS for the reader REST APIs of TimelineServer


[ 
https://issues.apache.org/jira/browse/YARN-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096685#comment-14096685
 ] 

Zhijie Shen commented on YARN-2173:
---

I've setup HTTPS locally for the timeline server, and verify it with security 
on and off. In both scenarios, the three timeline GET APIs and the generic 
history web services and UI were working fine. Therefore, HTTPS of the timeline 
server should just work by using WebApp. Close this ticket as implemented.

 Enabling HTTPS for the reader REST APIs of TimelineServer
 -

 Key: YARN-2173
 URL: https://issues.apache.org/jira/browse/YARN-2173
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2418) Resource Manager JMX root queue active users 0

2014-08-14 Thread Hari Sekhon (JIRA)

Hari Sekhon created YARN-2418:
-

 Summary: Resource Manager JMX root queue active users 0
 Key: YARN-2418
 URL: https://issues.apache.org/jira/browse/YARN-2418
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
 Environment: HDP2.1
Reporter: Hari Sekhon
Priority: Minor


I've observed the Yarn Resource Manager's JMX shows the active users in the 
root queue as 0 when the other metrics such as submitted jobs are showing the 
correct stats from the leaf queues.

I think the active users for the root queue should be the total of the active 
users for all the leaf queues for correctness since this is the cluster-wide 
stats?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2380) The normalizeRequests method in SchedulerUtils always resets the vCore to 1

2014-08-14 Thread Kenji Kikushima (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-2380:
--

Attachment: YARN-2380.patch

Hi, how about to keep vcores in DefaultResourceCalculator#normalize?
I think DefaultResourceCalculator should care only about memory.

 The normalizeRequests method in SchedulerUtils always resets the vCore to 1
 ---

 Key: YARN-2380
 URL: https://issues.apache.org/jira/browse/YARN-2380
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jian Fang
Priority: Critical
 Attachments: YARN-2380.patch


 I added some log info to the method normalizeRequest() as follows.
   public static void normalizeRequest(
   ResourceRequest ask, 
   ResourceCalculator resourceCalculator, 
   Resource clusterResource,
   Resource minimumResource,
   Resource maximumResource,
   Resource incrementResource) {
 LOG.info(Before request normalization, the ask capacity:  + 
 ask.getCapability());
 Resource normalized = 
 Resources.normalize(
 resourceCalculator, ask.getCapability(), minimumResource,
 maximumResource, incrementResource);
 LOG.info(After request normalization, the ask capacity:  + normalized);
 ask.setCapability(normalized);
   }
 The resulted log showed that the vcore in ask was changed from 2 to 1.
 2014-08-01 20:54:15,537 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC 
 Server handler 4 on 9024): Before request normalization, the ask capacity: 
 memory:1536, vCores:2
 2014-08-01 20:54:15,537 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC 
 Server handler 4 on 9024): After request normalization, the ask capacity: 
 memory:1536, vCores:1
 The root cause is the DefaultResourceCalculator calls 
 Resources.createResource(normalizedMemory) to regenerate a new resource with 
 vcore = 1.
 This bug is critical and it leads to the mismatch of the request resource and 
 the container resource and many other potential issues if the user requests 
 containers with vcore  1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2383) Add ability to renew ClientToAMToken

2014-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096838#comment-14096838
 ] 

Hadoop QA commented on YARN-2383:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12661662/YARN-2383.preview.3.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
  org.apache.hadoop.yarn.client.TestRMFailover
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4622//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4622//console

This message is automatically generated.

 Add ability to renew ClientToAMToken
 

 Key: YARN-2383
 URL: https://issues.apache.org/jira/browse/YARN-2383
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch, 
 YARN-2383.preview.3.1.patch, YARN-2383.preview.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server


[ 
https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096867#comment-14096867
 ] 

Hudson commented on YARN-2070:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #646 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/646/])
YARN-2070. Made DistributedShell publish the short user name to the timeline 
server. Contributed by Robert Kanter. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617837)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java


 DistributedShell publishes unfriendly user information to the timeline server
 -

 Key: YARN-2070
 URL: https://issues.apache.org/jira/browse/YARN-2070
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
Priority: Minor
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2070.patch


 Bellow is the code of using the string of current user object as the user 
 value.
 {code}
 entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser()
 .toString());
 {code}
 When we use kerberos authentication, it's going to output the full name, such 
 as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for 
 searching by the primary filters. It's better to use shortUserName instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API


[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096859#comment-14096859
 ] 

Hudson commented on YARN-2277:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #646 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/646/])
YARN-2277. Added cross-origin support for the timeline server web services. 
Contributed by Jonathan Eagles. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617832)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java


 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Fix For: 2.6.0

 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, 
 YARN-2277-v7.patch, YARN-2277-v8.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server


[ 
https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097097#comment-14097097
 ] 

Hudson commented on YARN-2070:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1837 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1837/])
YARN-2070. Made DistributedShell publish the short user name to the timeline 
server. Contributed by Robert Kanter. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617837)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java


 DistributedShell publishes unfriendly user information to the timeline server
 -

 Key: YARN-2070
 URL: https://issues.apache.org/jira/browse/YARN-2070
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
Priority: Minor
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2070.patch


 Bellow is the code of using the string of current user object as the user 
 value.
 {code}
 entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser()
 .toString());
 {code}
 When we use kerberos authentication, it's going to output the full name, such 
 as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for 
 searching by the primary filters. It's better to use shortUserName instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2419) RM applications page doesn't sort application id properly

2014-08-14 Thread Thomas Graves (JIRA)

Thomas Graves created YARN-2419:
---

 Summary: RM applications page doesn't sort application id properly
 Key: YARN-2419
 URL: https://issues.apache.org/jira/browse/YARN-2419
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Thomas Graves


The ResourceManager apps page doesn't sort the application ids properly when 
the app id rolls over from  to 1.

When it rolls over the 1+ application ids end up being many pages down by 
the 0XXX numbers.

I assume we just sort alphabetically so we would need a special sorter that 
knows about application ids.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API


[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097121#comment-14097121
 ] 

Hudson commented on YARN-2277:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1863 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1863/])
YARN-2277. Added cross-origin support for the timeline server web services. 
Contributed by Jonathan Eagles. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617832)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/webapp/CrossOriginFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestCrossOriginFilterInitializer.java


 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Fix For: 2.6.0

 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch, 
 YARN-2277-v7.patch, YARN-2277-v8.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2070) DistributedShell publishes unfriendly user information to the timeline server


[ 
https://issues.apache.org/jira/browse/YARN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097129#comment-14097129
 ] 

Hudson commented on YARN-2070:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1863 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1863/])
YARN-2070. Made DistributedShell publish the short user name to the timeline 
server. Contributed by Robert Kanter. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617837)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java


 DistributedShell publishes unfriendly user information to the timeline server
 -

 Key: YARN-2070
 URL: https://issues.apache.org/jira/browse/YARN-2070
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter
Priority: Minor
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2070.patch


 Bellow is the code of using the string of current user object as the user 
 value.
 {code}
 entity.addPrimaryFilter(user, UserGroupInformation.getCurrentUser()
 .toString());
 {code}
 When we use kerberos authentication, it's going to output the full name, such 
 as zjshen/localhost@LOCALHOST (auth.KERBEROS). It is not user friendly for 
 searching by the primary filters. It's better to use shortUserName instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2383) Add ability to renew ClientToAMToken

2014-08-14 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097167#comment-14097167
 ] 

Xuan Gong commented on YARN-2383:
-

testcase failures are un-related. All of them are port binding problems.

 Add ability to renew ClientToAMToken
 

 Key: YARN-2383
 URL: https://issues.apache.org/jira/browse/YARN-2383
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2383.preview.1.patch, YARN-2383.preview.2.patch, 
 YARN-2383.preview.3.1.patch, YARN-2383.preview.3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2056) Disable preemption at Queue level


[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097194#comment-14097194
 ] 

Eric Payne commented on YARN-2056:
--

{quote}
Could this be accomplished by changing 
{{yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity}} 
to be a per-queue value. Then for queues that we don't ever want to be 
preempted, we set {{max_ignored_over_capacity == (max_capacity/capacity)-1.0}}?

Vinod, the motivation from my perspective is that we need a way to gradually 
phase in preemption and so being able to configure the queues in a way that 
prevents and/or gradually allows preemption seems desirable.
{quote}

[~nroberts], [~mayank_bansal], and [~vinodkv],

Would the {{max_ignored_over_capacity}} property become something like 
{{yarn.resourcemanager.monitor.capacity.preemption.queue-path.max_ignored_over_capacity}}?

For example, if the capacity scheduler were configured with 2 leaf queues, 
{{excalibur}} and {{brisingr}}, I would imagine that the 
{{max_ignored_over_capacity}} property name would look like this:

{{yarn.resourcemanager.monitor.capacity.preemption.root.excalibur.max_ignored_over_capacity}}
{{yarn.resourcemanager.monitor.capacity.preemption.root.brisingr.max_ignored_over_capacity}}

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne

 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2409) Active to StandBy transition does not stop rmDispatcher that causes 1 AsyncDispatcher thread leak.


[ 
https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097221#comment-14097221
 ] 

Eric Payne commented on YARN-2409:
--

[~rohithsharma], thanks for the analysis and detailed description.

+1 (non-binding)

 Active to StandBy transition does not stop rmDispatcher that causes 1 
 AsyncDispatcher thread leak. 
 ---

 Key: YARN-2409
 URL: https://issues.apache.org/jira/browse/YARN-2409
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2409.patch


 {code}
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_ALLOCATED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:697)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:779)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:760)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:662)
 2014-08-12 07:03:00,839 ERROR org.apache.hadoop.ya
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2380) The normalizeRequests method in SchedulerUtils always resets the vCore to 1

2014-08-14 Thread Jian Fang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097225#comment-14097225
 ] 

Jian Fang commented on YARN-2380:
-

Any reason why you think DefaultResourceCalculator should care only about 
memory? 

Up to now, the resource is defined as memory, vcore.  If 
DefaultResourceCalculator does not do anything about memory, it should pass 
through the vcore value instead of setting it to 1, which would lead to a lot 
of potential issues such as the case in Tez. 

 The normalizeRequests method in SchedulerUtils always resets the vCore to 1
 ---

 Key: YARN-2380
 URL: https://issues.apache.org/jira/browse/YARN-2380
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jian Fang
Priority: Critical
 Attachments: YARN-2380.patch


 I added some log info to the method normalizeRequest() as follows.
   public static void normalizeRequest(
   ResourceRequest ask, 
   ResourceCalculator resourceCalculator, 
   Resource clusterResource,
   Resource minimumResource,
   Resource maximumResource,
   Resource incrementResource) {
 LOG.info(Before request normalization, the ask capacity:  + 
 ask.getCapability());
 Resource normalized = 
 Resources.normalize(
 resourceCalculator, ask.getCapability(), minimumResource,
 maximumResource, incrementResource);
 LOG.info(After request normalization, the ask capacity:  + normalized);
 ask.setCapability(normalized);
   }
 The resulted log showed that the vcore in ask was changed from 2 to 1.
 2014-08-01 20:54:15,537 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC 
 Server handler 4 on 9024): Before request normalization, the ask capacity: 
 memory:1536, vCores:2
 2014-08-01 20:54:15,537 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils (IPC 
 Server handler 4 on 9024): After request normalization, the ask capacity: 
 memory:1536, vCores:1
 The root cause is the DefaultResourceCalculator calls 
 Resources.createResource(normalizedMemory) to regenerate a new resource with 
 vcore = 1.
 This bug is critical and it leads to the mismatch of the request resource and 
 the container resource and many other potential issues if the user requests 
 containers with vcore  1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2393) Fair Scheduler : Implement static fair share


 [ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2393:
--

Attachment: YARN-2393-3.patch

Rebase a new patch.
[~kasha], for the reloading, if we want only update queues whose weights have 
been changed, it seems we need to change bundle of code as we need to compare 
the previous weight and current weight. I don't know whether this is a good 
problem. So in this patch, still keep the old way that does 
rootQueue.recomputeSteadyShares() once allocation file reloaded.

 Fair Scheduler : Implement static fair share
 

 Key: YARN-2393
 URL: https://issues.apache.org/jira/browse/YARN-2393
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch


 Static fair share is a fair share allocation considering all(active/inactive) 
 queues.It would be shown on the UI for better predictability of finish time 
 of applications.
 We would compute static fair share only when needed, like on queue creation, 
 node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2397) RM web interface sometimes returns request is a replay error in secure mode

2014-08-14 Thread Varun Vasudev (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097244#comment-14097244
]

Varun Vasudev commented on YARN-2397:
-

Thanks for the feedback [~zjshen]. My thinking is that in secure mode, we
should replace the AuthenticationFilterInitializer with the
RMAuthenticationInitializer to add support for authentication using delegation
tokens. In non-secure mode, the RMAuthenticationFilterInitializer and the
AuthenticationFilterInitializer are the the same so there's no need for any
replacement.

However, in non-secure mode, we should have a default filter in case none is
specified(so that users can use the rm web services), hence the code block for
non-secure mode.

RM web interface sometimes returns request is a replay error in secure mode
---

Key: YARN-2397
URL: https://issues.apache.org/jira/browse/YARN-2397
Project: Hadoop YARN
Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Critical
Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch

The RM web interface sometimes returns a request is a replay error if the
default kerberos http filter is enabled. This is because it uses the new
RMAuthenticationFilter in addition to the AuthenticationFilter. There is a
workaround to set
yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false.
This bug is to fix the code to use only the RMAuthenticationFilter and not
both.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097266#comment-14097266
 ] 

Jian He commented on YARN-2136:
---

bq. Hence dispatcher queue draining shouldn't matter as ZKClient is already 
closed.
After checking the code, I think we should flip the order closeInternal() and 
dispatcher.stop();  right?
{code}
  protected void serviceStop() throws Exception {
closeInternal();
dispatcher.stop();
  }
{code}

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-14 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097275#comment-14097275
 ] 

Craig Welch commented on YARN-1198:
---

So, it's possible to avoid iterating the applications in the queue and even the 
queue users if the antecedents of the headroom calculation are shared and 
updated at the queue level on change (qmaxcap...) and the final calculation is 
done during the heartbeat request / call to scheduler application attempt.  It 
would just be a calculation over these resources  some user specific values, 
should be reasonably performant, but it would move the final activity away from 
where it is today.

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
 YARN-1198.4.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2390) Investigating whehther generic history service needs to support queue-acls

2014-08-14 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097280#comment-14097280
 ] 

Sunil G commented on YARN-2390:
---

Yes. I understood your idea, but completed apps can be there in RM for some 
more time (1 is default number of completed apps in RM). and ACL's will be 
applicable for these completed apps still. 
In History Server, behavior now is different for same completed app once its 
moved from RM. This was the only point i was thinking we may need to look to. 
What  do you feel about this?


 Investigating whehther generic history service needs to support queue-acls
 --

 Key: YARN-2390
 URL: https://issues.apache.org/jira/browse/YARN-2390
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen

 According YARN-1250,  it's arguable whether queue-acls should be applied to 
 the generic history service as well, because the queue admin may not need the 
 access to the completed application that is removed from the queue. Create 
 this ticket to tackle the discussion around.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-08-14 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097298#comment-14097298
 ] 

Sunil G commented on YARN-2056:
---

+1 This make sense. I have a doubt though.

*max_ignored_over_capacity* will help to avoid the jitter when container sizes 
varies and some times we do a little more/less preemption from leaf queue than 
its defined capacity. So more or less its boil down to the resource size of 
containers. per-queue configuration for max_ignored_over_capacity will 
definitely give more control than now, but still if heterogeneous applications 
(in terms of container resource) keeps running same queue, it may be still hard 
to get a correct value.

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne

 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced

2014-08-14 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097341#comment-14097341
 ] 

Sunil G commented on YARN-2136:
---

Yes. I also feel we need to flip the order for dispatcher.stop.





 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share

2014-08-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097359#comment-14097359
 ] 

Hadoop QA commented on YARN-2393:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661745/YARN-2393-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4623//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4623//console

This message is automatically generated.

 Fair Scheduler : Implement static fair share
 

 Key: YARN-2393
 URL: https://issues.apache.org/jira/browse/YARN-2393
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2393-1.patch, YARN-2393-2.patch, YARN-2393-3.patch


 Static fair share is a fair share allocation considering all(active/inactive) 
 queues.It would be shown on the UI for better predictability of finish time 
 of applications.
 We would compute static fair share only when needed, like on queue creation, 
 node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-08-14 Thread Craig Welch (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Craig Welch updated YARN-1857:
--

Attachment: YARN-1857.1.patch

Just updating to a patch which applies against current trunk, otherwise
unchanged

CapacityScheduler headroom doesn't account for other AM's running
-

Key: YARN-1857
URL: https://issues.apache.org/jira/browse/YARN-1857
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
Attachments: YARN-1857.1.patch, YARN-1857.patch, YARN-1857.patch,
YARN-1857.patch

Its possible to get an application to hang forever (or a long time) in a
cluster with multiple users. The reason why is that the headroom sent to the
application is based on the user limit but it doesn't account for other
Application masters using space in that queue. So the headroom (user limit -
user consumed) can be 0 even though the cluster is 100% full because the
other space is being used by application masters from other users.
For instance if you have a cluster with 1 queue, user limit is 100%, you have
multiple users submitting applications. One very large application by user 1
starts up, runs most of its maps and starts running reducers. other users try
to start applications and get their application masters started but not
tasks. The very large application then gets to the point where it has
consumed the rest of the cluster resources with all reduces. But at this
point it needs to still finish a few maps. The headroom being sent to this
application is only based on the user limit (which is 100% of the cluster
capacity) its using lets say 95% of the cluster for reduces and then other 5%
is being used by other users running application masters. The MRAppMaster
thinks it still has 5% so it doesn't know that it should kill a reduce in
order to run a map.
This can happen in other scenarios also. Generally in a large cluster with
multiple queues this shouldn't cause a hang forever but it could cause the
application to take much longer.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-281) Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits

2014-08-14 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned YARN-281:


Assignee: Wangda Tan  (was: Harsh J)

Sorry on delay, reassigned.

 Add a test for YARN Schedulers' MAXIMUM_ALLOCATION limits
 -

 Key: YARN-281
 URL: https://issues.apache.org/jira/browse/YARN-281
 Project: Hadoop YARN
  Issue Type: Test
  Components: scheduler
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Assignee: Wangda Tan
  Labels: test

 We currently have tests that test MINIMUM_ALLOCATION limits for FifoScheduler 
 and the likes, but no test for MAXIMUM_ALLOCATION yet. We should add a test 
 to prevent regressions of any kind on such limits.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2390) Investigating whehther generic history service needs to support queue-acls

2014-08-14 Thread Subramaniam Venkatraman Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097385#comment-14097385
 ] 

Zhijie Shen commented on YARN-2390:
---

bq. but completed apps can be there in RM for some more time (1 is default 
number of completed apps in RM). and ACL's will be applicable for these 
completed apps still. 

[~sunilg], that's a good point. I agree it would be nice if RM and GHS have 
consistent access control for finished application. However, if it's reasonable 
that the queue admin shouldn't have the access to the complete app which is 
removed from the queue, is the right fix to be correcting the ACLs on RM side?

One related issue is that while CLI will check the user's ACLs properly, 
neither GET APIs nor web UI honor the ACLs completely at RM side (therefore, I 
filed YARN-2310 and YARN-2311 before).

 Investigating whehther generic history service needs to support queue-acls
 --

 Key: YARN-2390
 URL: https://issues.apache.org/jira/browse/YARN-2390
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen

 According YARN-1250,  it's arguable whether queue-acls should be applied to 
 the generic history service as well, because the queue admin may not need the 
 access to the completed application that is removed from the queue. Create 
 this ticket to tackle the discussion around.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced

2014-08-14 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097402#comment-14097402
 ] 

Varun Saxena commented on YARN-2136:


Yes, completely agree with you [~jianhe]. dispatcher.stop() will cause events 
in dispatcher queue(if any) to be processed first. These events would be lost 
if we call closeInternal() before dispatcher.stop()

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan updated YARN-2378:
---

Attachment: YARN-2378.patch

Good suggestion [~jianhe]. Uploading an updated pitch that has the fix.

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, 
 YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced

2014-08-14 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097513#comment-14097513
 ] 

Varun Saxena commented on YARN-2136:


However, if we do flip the order of these statements, I think we can then have 
a FENCED state because when we stop the dispatcher queue, it will be drained 
first and hence the pending events will be processed first. In this case, 
store/update will be sent to ZK.  What's your opinion, [~jianhe] and [~sunilg] ?

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097548#comment-14097548
 ] 

Jian He commented on YARN-2136:
---

bq. it will be drained first and hence the pending events will be processed 
first.
we are supposed to handle these pending events.  right?

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced

2014-08-14 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097582#comment-14097582
 ] 

Varun Saxena commented on YARN-2136:


Ideally these events should be processed. But if Store is already fenced, I 
guess NoAuthException will again be reported by ZK, so processing this event 
wont lead to any useful operation.

 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He

 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2385) Adding support for listing all applications in a queue

[
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097631#comment-14097631
]

Zhijie Shen commented on YARN-2385:
---

bq. May be two separate apis (getRunningAppsInQueue, getPendingAppsInQueue)
with common behavior across CS/Fair could be a better approach.

+1 for getRunningAppsInQueue + getPendingAppsInQueue, which sounds more
flexible to get each individual metric than a sum.

Previously, getAppsInQueue is used for getQueueInfo and getApplications. In the
former use case, we can replace it with getRunningAppsInQueue +
getPendingAppsInQueue, while in the latter one, it's not accurate enough to
only include the apps inside the queue, but it's a separate issue.

Adding support for listing all applications in a queue
--

Key: YARN-2385
URL: https://issues.apache.org/jira/browse/YARN-2385
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
Labels: abstractyarnscheduler

This JIRA proposes adding a method in AbstractYarnScheduler to get all the
pending/active applications. Fair scheduler already supports moving a single
application from one queue to another. Support for the same is being added to
Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition
of this method, we can transparently add support for moving all applications
from source queue to target queue and draining a queue, i.e. killing all
applications in a queue as proposed by YARN-2389

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2397) RM web interface sometimes returns request is a replay error in secure mode


[ 
https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097652#comment-14097652
 ] 

Zhijie Shen commented on YARN-2397:
---

Make sense to me. I'll commit the patch.

 RM web interface sometimes returns request is a replay error in secure mode
 ---

 Key: YARN-2397
 URL: https://issues.apache.org/jira/browse/YARN-2397
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Critical
 Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch


 The RM web interface sometimes returns a request is a replay error if the 
 default kerberos http filter is enabled. This is because it uses the new 
 RMAuthenticationFilter in addition to the AuthenticationFilter. There is a 
 workaround to set 
 yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. 
 This bug is to fix the code to use only the RMAuthenticationFilter and not 
 both.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2397) RM and TS web interfaces sometimes return request is a replay error in secure mode


 [ 
https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2397:
--

Summary: RM and TS web interfaces sometimes return request is a replay 
error in secure mode  (was: RM web interface sometimes returns request is a 
replay error in secure mode)

 RM and TS web interfaces sometimes return request is a replay error in secure 
 mode
 --

 Key: YARN-2397
 URL: https://issues.apache.org/jira/browse/YARN-2397
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Critical
 Fix For: 2.6.0

 Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch


 The RM web interface sometimes returns a request is a replay error if the 
 default kerberos http filter is enabled. This is because it uses the new 
 RMAuthenticationFilter in addition to the AuthenticationFilter. There is a 
 workaround to set 
 yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. 
 This bug is to fix the code to use only the RMAuthenticationFilter and not 
 both.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2397) RM and TS web interfaces sometimes return request is a replay error in secure mode

[
https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhijie Shen updated YARN-2397:
--

Description:
The RM web interface sometimes returns a request is a replay error if the
default kerberos http filter is enabled. This is because it uses the new
RMAuthenticationFilter in addition to the AuthenticationFilter. There is a
workaround to set
yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false.
This bug is to fix the code to use only the RMAuthenticationFilter and not both.

The similar problem happens to the timeline server web interface as well.

was:The RM web interface sometimes returns a request is a replay error if the
default kerberos http filter is enabled. This is because it uses the new
RMAuthenticationFilter in addition to the AuthenticationFilter. There is a
workaround to set
yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false.
This bug is to fix the code to use only the RMAuthenticationFilter and not both.

RM and TS web interfaces sometimes return request is a replay error in secure
mode
--

Key: YARN-2397
URL: https://issues.apache.org/jira/browse/YARN-2397
Project: Hadoop YARN
Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Critical
Fix For: 2.6.0

Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2397) RM and TS web interfaces sometimes return request is a replay error in secure mode


[ 
https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097688#comment-14097688
 ] 

Hudson commented on YARN-2397:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6071 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6071/])
YARN-2397. Avoided loading two authentication filters for RM and TS web 
interfaces. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618054)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http/RMAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokenAuthentication.java


 RM and TS web interfaces sometimes return request is a replay error in secure 
 mode
 --

 Key: YARN-2397
 URL: https://issues.apache.org/jira/browse/YARN-2397
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Critical
 Fix For: 2.6.0

 Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch


 The RM web interface sometimes returns a request is a replay error if the 
 default kerberos http filter is enabled. This is because it uses the new 
 RMAuthenticationFilter in addition to the AuthenticationFilter. There is a 
 workaround to set 
 yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. 
 This bug is to fix the code to use only the RMAuthenticationFilter and not 
 both.
 The similar problem happens to the timeline server web interface as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2365) TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry fails on branch-2

2014-08-14 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2365:


Description: 
TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails on branch-2 with 
the following errror
{noformat}
Running 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 46.471 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart)
  Time elapsed: 46.354 sec   FAILURE!
java.lang.AssertionError: AppAttempt state is not correct (timedout) 
expected:ALLOCATED but was:SCHEDULED
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:414)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:569)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:576)
at 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389)
{noformat}

  was:
TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails on branch with 
the following errror
{noformat}
Running 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 46.471 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart)
  Time elapsed: 46.354 sec   FAILURE!
java.lang.AssertionError: AppAttempt state is not correct (timedout) 
expected:ALLOCATED but was:SCHEDULED
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:414)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:569)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:576)
at 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389)
{noformat}


 TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry fails on branch-2
 --

 Key: YARN-2365
 URL: https://issues.apache.org/jira/browse/YARN-2365
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai

 TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails on branch-2 
 with the following errror
 {noformat}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 46.471 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
 testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart)
   Time elapsed: 46.354 sec   FAILURE!
 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:414)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:569)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:576)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097737#comment-14097737
 ] 

Eric Payne commented on YARN-415:
-

[~jianhe], Thank you very much for reviewing this patch.

{quote}
- we can reuse the previous rmAttempt and resource object
{code}
RMAppAttempt rmAttempt = container.rmContext.getRMApps()
   .get(container.getApplicationAttemptId().getApplicationId())
   .getRMAppAttempt(container.getApplicationAttemptId());
Resource resource = container.getContainer().getResource();
{code}
{quote}

I will reuse the Resource object, but I'm not sure if I can reuse the 
RMAppAttempt object.

In the following code snippet, the preemption path is always updating the 
attempt metrics for the current app attempt. In the chargeback (resource 
utilization metrics) path, that's not always what we want. Containers do not 
always complete before a current attempt dies and a new one is started. If this 
happens, the chargeback path should update the metrics for the first attempt, 
not the second one. The call to 
{{...getRMAppAttempt(container.getApplicationAttemptId())}} will always get the 
attempt that started the container.

Now that I think about it, it seems like that is what we want in the preemption 
path as well.

[~leftnoteasy], can you please comment? If the preemption path should update 
the preemption info for the attempt that started the finished container, then 
we can reuse the RMAppAttempt object for both paths.

{code}
  if (ContainerExitStatus.PREEMPTED == container.finishedStatus
.getExitStatus()) {
Resource resource = container.getContainer().getResource();
RMAppAttempt rmAttempt =
container.rmContext.getRMApps()
  .get(container.getApplicationAttemptId().getApplicationId())
  .getCurrentAppAttempt();
rmAttempt.getRMAppAttemptMetrics().updatePreemptionInfo(resource,
  container);
  }

RMAppAttempt rmAttempt = container.rmContext.getRMApps()
   .get(container.getApplicationAttemptId().getApplicationId())
   .getRMAppAttempt(container.getApplicationAttemptId());
{code}

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1918) Typo in description and error message for 'yarn.resourcemanager.cluster-id'


[ 
https://issues.apache.org/jira/browse/YARN-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097753#comment-14097753
 ] 

Hudson commented on YARN-1918:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6073 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6073/])
YARN-1918. Typo in description and error message for 
yarn.resourcemanager.cluster-id (Anandha L Ranganathan via aw) (aw: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618070)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Typo in description and error message for 'yarn.resourcemanager.cluster-id'
 ---

 Key: YARN-1918
 URL: https://issues.apache.org/jira/browse/YARN-1918
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Anandha L Ranganathan
Priority: Trivial
  Labels: newbie
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-1918.1.patch


 1.  In yarn-default.xml
 {code:xml}
 property
 descriptionName of the cluster. In a HA setting,
   this is used to ensure the RM participates in leader
   election fo this cluster and ensures it does not affect
   other clusters/description
 nameyarn.resourcemanager.cluster-id/name
 !--valueyarn-cluster/value--
   /property
 {code}
 Here the line 'election fo this cluster and ensures it does not affect' 
 should be replaced with  'election for this cluster and ensures it does not 
 affect'.
 2. 
 {code:xml}
 org.apache.hadoop.HadoopIllegalArgumentException: Configuration doesn't 
 specifyyarn.resourcemanager.cluster-id
   at 
 org.apache.hadoop.yarn.conf.YarnConfiguration.getClusterId(YarnConfiguration.java:1336)
 {code}
 In the above exception message, it is missing a space between message and 
 configuration name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2197) Add a link to YARN CHANGES.txt in the left side of doc


[ 
https://issues.apache.org/jira/browse/YARN-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097752#comment-14097752
 ] 

Hudson commented on YARN-2197:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6073 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6073/])
YARN-2197. Add a link to YARN CHANGES.txt in the left side of doc (Akira 
AJISAKA via aw) (aw: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618066)
* /hadoop/common/trunk/hadoop-project/src/site/site.xml
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Add a link to YARN CHANGES.txt in the left side of doc
 --

 Key: YARN-2197
 URL: https://issues.apache.org/jira/browse/YARN-2197
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.4.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-2197.patch


 Now there're the links to Common, HDFS and MapReduce CHANGES.txt in the left 
 side of the document (hadoop-project/src/site/site.xml), but YARN does not 
 exist.
 {code}
   item name=Common CHANGES.txt 
 href=hadoop-project-dist/hadoop-common/CHANGES.txt/
   item name=HDFS CHANGES.txt 
 href=hadoop-project-dist/hadoop-hdfs/CHANGES.txt/
   item name=MapReduce CHANGES.txt 
 href=hadoop-project-dist/hadoop-mapreduce/CHANGES.txt/
   item name=Metrics 
 href=hadoop-project-dist/hadoop-common/Metrics.html/
 {code}
 A link to YARN CHANGES.txt should be added.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-08-14 Thread Nathan Roberts (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097823#comment-14097823
 ] 

Nathan Roberts commented on YARN-2056:
--

[~sunilg] I'm not following the doubt. It may still be hard to get to a correct 
value for what exactly?  As far as completely disabling preemption for the 
queue, that should just be a matter of setting max_ignored_over_capacity to a 
sufficiently large value. To disable, it has to be at least 
((max_capacity/capacity)-1) but it could just as well be something quite large 
and that would effectively prevent preemption. I guess I'm saying it doesn't 
have to be ultra precise. 

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne

 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2420) Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer

Wei Yan created YARN-2420:
-

 Summary: Fair Scheduler: change yarn.scheduler.fair.assignmultiple 
from boolean to integer
 Key: YARN-2420
 URL: https://issues.apache.org/jira/browse/YARN-2420
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2420) Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer

2014-08-14 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097937#comment-14097937
 ] 

Sandy Ryza commented on YARN-2420:
--

Does yarn.scheduler.fair.max.assign satisfy what you're looking for?

 Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to 
 integer
 -

 Key: YARN-2420
 URL: https://issues.apache.org/jira/browse/YARN-2420
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2420) Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to integer


[ 
https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097941#comment-14097941
 ] 

Wei Yan commented on YARN-2420:
---

yes, my mistake. Just saw the max.assign field. I'll change this jira for 
another maxassign feature which automatically update the value for max.assign, 
based on current cluster load.

 Fair Scheduler: change yarn.scheduler.fair.assignmultiple from boolean to 
 integer
 -

 Key: YARN-2420
 URL: https://issues.apache.org/jira/browse/YARN-2420
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2420) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load


 [ 
https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2420:
--

Summary: Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign 
based on cluster load  (was: Fair Scheduler: change 
yarn.scheduler.fair.assignmultiple from boolean to integer)

 Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on 
 cluster load
 ---

 Key: YARN-2420
 URL: https://issues.apache.org/jira/browse/YARN-2420
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1918) Typo in description and error message for 'yarn.resourcemanager.cluster-id'

2014-08-14 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097951#comment-14097951
 ] 

Tsuyoshi OZAWA commented on YARN-1918:
--

Thanks for your review, Allen.

 Typo in description and error message for 'yarn.resourcemanager.cluster-id'
 ---

 Key: YARN-1918
 URL: https://issues.apache.org/jira/browse/YARN-1918
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Devaraj K
Assignee: Anandha L Ranganathan
Priority: Trivial
  Labels: newbie
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-1918.1.patch


 1.  In yarn-default.xml
 {code:xml}
 property
 descriptionName of the cluster. In a HA setting,
   this is used to ensure the RM participates in leader
   election fo this cluster and ensures it does not affect
   other clusters/description
 nameyarn.resourcemanager.cluster-id/name
 !--valueyarn-cluster/value--
   /property
 {code}
 Here the line 'election fo this cluster and ensures it does not affect' 
 should be replaced with  'election for this cluster and ensures it does not 
 affect'.
 2. 
 {code:xml}
 org.apache.hadoop.HadoopIllegalArgumentException: Configuration doesn't 
 specifyyarn.resourcemanager.cluster-id
   at 
 org.apache.hadoop.yarn.conf.YarnConfiguration.getClusterId(YarnConfiguration.java:1336)
 {code}
 In the above exception message, it is missing a space between message and 
 configuration name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097958#comment-14097958
 ] 

Jian He commented on YARN-2378:
---

looks good to me , resubmitting the same patch to kick jenkins

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch, 
 YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2420) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load

2014-08-14 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097961#comment-14097961
 ] 

Sandy Ryza commented on YARN-2420:
--

Cool.

Regarding adjusting maxassign dynamically, my view has been that this isn't 
needed when continuous scheduling is turned on, and eventually we expect 
everyone to switch over to continuous scheduling.  Thoughts?

 Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on 
 cluster load
 ---

 Key: YARN-2420
 URL: https://issues.apache.org/jira/browse/YARN-2420
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2420) Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on cluster load


[ 
https://issues.apache.org/jira/browse/YARN-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097965#comment-14097965
 ] 

Wei Yan commented on YARN-2420:
---

For continuous scheduling, yes, we don't need maxAttempt, and always assign one 
container to one node for each round. Where currently continuous scheduling 
assigns maxAttempt containers per node.

 Fair Scheduler: dynamically update yarn.scheduler.fair.max.assign based on 
 cluster load
 ---

 Key: YARN-2420
 URL: https://issues.apache.org/jira/browse/YARN-2420
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14097966#comment-14097966
 ] 

Karthik Kambatla commented on YARN-1959:


Would it make more sense to have it to be {{queue-fair-share - 
queue-consumed}}? Now that the fairshare is instantaneous, it is the maximum 
resources the app can safely expect to get. No? 

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Anubhav Dhoot

 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues

2014-08-14 Thread Ram Venkatesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Venkatesh updated YARN-2411:


Description: 
YARN-2257 has a proposal to extend and share the queue placement rules for the 
fair scheduler and the capacity scheduler. This is a good long term solution to 
streamline queue placement of both schedulers but it has core infra work that 
has to happen first and might require changes to current features in all 
schedulers along with corresponding configuration changes, if any. 

I would like to propose a change with a smaller scope in the capacity scheduler 
that addresses the core use cases for implicitly mapping jobs that have the 
default queue or no queue specified to specific queues based on the submitting 
user and user groups. It will be useful in a number of real-world scenarios and 
can be migrated over to the unified scheme when YARN-2257 becomes available.

The proposal is to add two new configuration options:

yarn.scheduler.capacity.queue-mappings-override.enable 
A boolean that controls if user-specified queues can be overridden by the 
mapping, default is false.

and,
yarn.scheduler.capacity.queue-mappings
A string that specifies a list of mappings in the following format (default is 
 which is the same as no mapping)

map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
map_specifier := user (u) | group (g)
source_attribute := user | group | %user
queue_name := the name of the mapped queue | %user | %primary_group

The mappings will be evaluated left to right, and the first valid mapping will 
be used. If the mapped queue does not exist, or the current user does not have 
permissions to submit jobs to the mapped queue, the submission will fail.

Example usages:
1. user1 is mapped to queue1, group1 is mapped to queue2
u:user1:queue1,g:group1:queue2

2. To map users to queues with the same name as the user:
u:%user:%user

I am happy to volunteer to take this up.

  was:
YARN-2257 has a proposal to extend and share the queue placement rules for the 
fair scheduler and the capacity scheduler. This is a good long term solution to 
streamline queue placement of both schedulers but it has core infra work that 
has to happen first and might require changes to current features in all 
schedulers along with corresponding configuration changes, if any. 

I would like to propose a change with a smaller scope in the capacity scheduler 
that addresses the core use cases for implicitly mapping jobs that have the 
default queue or no queue specified to specific queues based on the submitting 
user and user groups. It will be useful in a number of real-world scenarios and 
can be migrated over to the unified scheme when YARN-2257 becomes available.

The proposal is to add two new configuration options:

yarn.scheduler.capacity.queue-mappings.enable 
A boolean that controls if queue mappings are enabled, default is false.

and,
yarn.scheduler.capacity.queue-mappings
A string that specifies a list of mappings in the following format:

map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
map_specifier := user (u) | group (g)
source_attribute := user | group | %user
queue_name := the name of the mapped queue | %user | %primary_group

The mappings will be evaluated left to right, and the first valid mapping will 
be used. If the mapped queue does not exist, or the current user does not have 
permissions to submit jobs to the mapped queue, the submission will fail.

Example usages:
1. user1 is mapped to queue1, group1 is mapped to queue2
u:user1:queue1,g:group1:queue2

2. To map users to queues with the same name as the user:
u:%user:%user

I am happy to volunteer to take this up.


 [Capacity Scheduler] support simple user and group mappings to queues
 -

 Key: YARN-2411
 URL: https://issues.apache.org/jira/browse/YARN-2411
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Ram Venkatesh
Assignee: Ram Venkatesh

 YARN-2257 has a proposal to extend and share the queue placement rules for 
 the fair scheduler and the capacity scheduler. This is a good long term 
 solution to streamline queue placement of both schedulers but it has core 
 infra work that has to happen first and might require changes to current 
 features in all schedulers along with corresponding configuration changes, if 
 any. 
 I would like to propose a change with a smaller scope in the capacity 
 scheduler that addresses the core use cases for implicitly mapping jobs that 
 have the default queue or no queue specified to specific queues based on the 
 submitting user and user groups. It will be useful in a number of real-world 
 scenarios and can be migrated over to the unified

[jira] [Updated] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues

2014-08-14 Thread Ram Venkatesh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Venkatesh updated YARN-2411:


Attachment: YARN-2411.1.patch

This patch contains enables jobs to be submitted to queues based on mappings 
specified in the configuration file. The syntax of the mapping is in in the 
description of this JIRA.


 [Capacity Scheduler] support simple user and group mappings to queues
 -

 Key: YARN-2411
 URL: https://issues.apache.org/jira/browse/YARN-2411
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Ram Venkatesh
Assignee: Ram Venkatesh
 Attachments: YARN-2411.1.patch


 YARN-2257 has a proposal to extend and share the queue placement rules for 
 the fair scheduler and the capacity scheduler. This is a good long term 
 solution to streamline queue placement of both schedulers but it has core 
 infra work that has to happen first and might require changes to current 
 features in all schedulers along with corresponding configuration changes, if 
 any. 
 I would like to propose a change with a smaller scope in the capacity 
 scheduler that addresses the core use cases for implicitly mapping jobs that 
 have the default queue or no queue specified to specific queues based on the 
 submitting user and user groups. It will be useful in a number of real-world 
 scenarios and can be migrated over to the unified scheme when YARN-2257 
 becomes available.
 The proposal is to add two new configuration options:
 yarn.scheduler.capacity.queue-mappings-override.enable 
 A boolean that controls if user-specified queues can be overridden by the 
 mapping, default is false.
 and,
 yarn.scheduler.capacity.queue-mappings
 A string that specifies a list of mappings in the following format (default 
 is  which is the same as no mapping)
 map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
 map_specifier := user (u) | group (g)
 source_attribute := user | group | %user
 queue_name := the name of the mapped queue | %user | %primary_group
 The mappings will be evaluated left to right, and the first valid mapping 
 will be used. If the mapped queue does not exist, or the current user does 
 not have permissions to submit jobs to the mapped queue, the submission will 
 fail.
 Example usages:
 1. user1 is mapped to queue1, group1 is mapped to queue2
 u:user1:queue1,g:group1:queue2
 2. To map users to queues with the same name as the user:
 u:%user:%user
 I am happy to volunteer to take this up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback


 [ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-415:


Attachment: YARN-415.201408150030.txt

{quote}
- Can you please elaborate in what scenario we need the following extra check?
{code}
// Only add in the running containers if this is the active attempt.
RMAppAttempt currentAttempt = rmContext.getRMApps()
   .get(attemptId.getApplicationId()).getCurrentAppAttempt();
if (currentAttempt != null 
currentAttempt.getAppAttemptId().compareTo(attemptId) == 0) {
  ApplicationResourceUsageReport appResUsageReport = rmContext
.getScheduler().getAppResourceUsageReport(attemptId);
  if (appResUsageReport != null) {
memorySeconds += appResUsageReport.getMemorySeconds();
vcoreSeconds += appResUsageReport.getVcoreSeconds();
  }
}
{code}
{quote}

An app could have multiple attempts if, for example, the first attempt died in 
the middle and the RM starts a second attempt for this app. In that situation, 
when RMAppAttemptMetrics#getRMAppMetrics is called for the first attempt, we 
only want to report the info for the completed containers, and when it is 
called for the second (running) attempt, we want to report for both completed 
and running containers. Of course, this is a little misleading when you have 
work-preserving restart enabled, and the running containers didn't die with the 
first attempt. While they are running, they are reported as the metrics for the 
second attempt, but when they complete, their metrics go back into the first 
attempt. Since these metrics are only reported at the app level, I think this 
should be okay. The important thing is that the running metrics only get 
reported once and don't get double-counted.

{quote}
- Also, currentAttempt.getAppAttemptId().compareTo(attemptId) == 0, we can use 
equals instead which looks more intuitive. 
{quote}
Good point. I made the change.

{quote}
- getFinishedMemorySeconds and getFinishedVcoreSeconds methods are not used.
- For setFinishedVcoreSeconds and setFinishedMemorySeconds, we can just use 
updateResourceUtilization
{quote}
I used updateResourceUtilization as you suggested, and removed the getters and 
setters.

{quote}
- RMStateStore#removeApplication: no need to calculate the memory utilization 
when removing the app. Saving some cost for the loop of attempts
{quote}
Good catch. I removed this calculation.


 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.201408132109.txt, 
 YARN-415.201408150030.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-08-14 Thread Subramaniam Venkatraman Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098024#comment-14098024
 ] 

Jian He commented on YARN-2229:
---

the latest patch seems not applying on trunk any more. Can you update please ? 
thx

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
 YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, 
 YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2229) ContainerId can overflow with RM restart

2014-08-14 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2229:
-

Attachment: YARN-2229.12.patch

Thanks for your notification, Jian. Refreshed a patch.

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.12.patch, 
 YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, YARN-2229.4.patch, 
 YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, YARN-2229.8.patch, 
 YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1458) In Fair Scheduler, size based weight can cause update thread to hold lock indefinitely

2014-08-14 Thread George Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14098088#comment-14098088
 ] 

George Wong commented on YARN-1458:
---

The regression is 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testIsStarvedForFairShare.

I applied the patch to the latest trunk code, ran this UT in my local laptop. 
The UT always succeeds.
I've also check the code, but could not figure out why the UT fails.
Can anyone help?

Thanks.

 In Fair Scheduler, size based weight can cause update thread to hold lock 
 indefinitely
 --

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
  Labels: patch
 Fix For: 2.2.1

 Attachments: YARN-1458.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-08-14 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1506:
-

Attachment: YARN-1506-v12.patch

In v12 patch,
- fix unit test failure for node reconnecting with resource update.
- fix unit test failure for event cast.
- fix findbug warning on synchronization.

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
 Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, 
 YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v2.patch, 
 YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, 
 YARN-1506-v6.patch, YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan updated YARN-2378:
---

Attachment: YARN-2378-1.patch

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378-1.patch, YARN-2378.patch, YARN-2378.patch, 
 YARN-2378.patch, YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1371) FIFO scheduler to re-populate container allocation state


 [ 
https://issues.apache.org/jira/browse/YARN-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1371:
---

Fix Version/s: (was: 2.5.0)

 FIFO scheduler to re-populate container allocation state
 

 Key: YARN-1371
 URL: https://issues.apache.org/jira/browse/YARN-1371
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He

 YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
 containers and the RM will pass this information to the schedulers along with 
 the node information. The schedulers are currently already informed about 
 previously running apps when the app data is recovered from the store. The 
 scheduler is expected to be able to repopulate its allocation state from the 
 above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2020) observeOnly should be checked before any preemption computation started inside containerBasedPreemptOrKill() of ProportionalCapacityPreemptionPolicy.java


 [ 
https://issues.apache.org/jira/browse/YARN-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2020:
---

Fix Version/s: (was: 2.5.0)

 observeOnly should be checked before any preemption computation started 
 inside containerBasedPreemptOrKill() of 
 ProportionalCapacityPreemptionPolicy.java
 -

 Key: YARN-2020
 URL: https://issues.apache.org/jira/browse/YARN-2020
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
 Environment: all
Reporter: yeqi
Priority: Trivial
 Attachments: YARN-2020.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 observeOnly should be checked in the very beginning of  
 ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(),  so that 
 to avoid unnecessary workload.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2126) The FSLeafQueue.amResourceUsage shouldn't be updated when an Application removed before it runs AM


 [ 
https://issues.apache.org/jira/browse/YARN-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2126:
---

Fix Version/s: (was: 2.5.0)

 The FSLeafQueue.amResourceUsage shouldn't be updated when an Application 
 removed before it runs AM
 --

 Key: YARN-2126
 URL: https://issues.apache.org/jira/browse/YARN-2126
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan

 When an application is removed, the FSLeafQueue updates its amResourceUsage.
 {code}
   if (runnableAppScheds.remove(app.getAppSchedulable())) {
   // Update AM resource usage
   if (app.getAMResource() != null) {
 Resources.subtractFrom(amResourceUsage, app.getAMResource());
   }
   return true;
   }
 {code}
 If an application is removed before it has a chance to start its AM, the 
 amResourceUsage shouldn't be updated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1369) Capacity scheduler to re-populate container allocation state


 [ 
https://issues.apache.org/jira/browse/YARN-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1369:
---

Fix Version/s: (was: 2.5.0)

 Capacity scheduler to re-populate container allocation state
 

 Key: YARN-1369
 URL: https://issues.apache.org/jira/browse/YARN-1369
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Jian He

 YARN-1367 and YARN-1368 enable the NM to tell the RM about currently running 
 containers and the RM will pass this information to the schedulers along with 
 the node information. The schedulers are currently already informed about 
 previously running apps when the app data is recovered from the store. The 
 scheduler is expected to be able to repopulate its allocation state from the 
 above 2 sources of information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container

[
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karthik Kambatla updated YARN-1621:
---

Fix Version/s: (was: 2.5.0)
2.6.0

Add CLI to list rows of task attempt ID, container ID, host of container,
state of container
--

Key: YARN-1621
URL: https://issues.apache.org/jira/browse/YARN-1621
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
Fix For: 2.6.0

As more applications are moved to YARN, we need generic CLI to list rows of
task attempt ID, container ID, host of container, state of container. Today
if YARN application running in a container does hang, there is no way to find
out more info because a user does not know where each attempt is running in.
For each running application, it is useful to differentiate between
running/succeeded/failed/killed containers.

{code:title=proposed yarn cli}
$ yarn application -list-containers -applicationId appId [-containerState
state of container]
where containerState is optional filter to list container in given state only.
container state can be running/succeeded/killed/failed/all.
A user can specify more than one container state at once e.g. KILLED,FAILED.
task attempt ID container ID host of container state of container
{code}
CLI should work with running application/completed application. If a
container runs many task attempts, all attempts should be shown. That will
likely be the case of Tez container-reuse application.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-160) nodemanagers should obtain cpu/memory values from underlying OS


 [ 
https://issues.apache.org/jira/browse/YARN-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-160:
--

Fix Version/s: (was: 2.5.0)
   2.6.0

 nodemanagers should obtain cpu/memory values from underlying OS
 ---

 Key: YARN-160
 URL: https://issues.apache.org/jira/browse/YARN-160
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Varun Vasudev
 Fix For: 2.6.0


 As mentioned in YARN-2
 *NM memory and CPU configs*
 Currently these values are coming from the config of the NM, we should be 
 able to obtain those values from the OS (ie, in the case of Linux from 
 /proc/meminfo  /proc/cpuinfo). As this is highly OS dependent we should have 
 an interface that obtains this information. In addition implementations of 
 this interface should be able to specify a mem/cpu offset (amount of mem/cpu 
 not to be avail as YARN resource), this would allow to reserve mem/cpu for 
 the OS and other services outside of YARN containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue


 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2113:
---

Fix Version/s: (was: 2.5.0)
   2.6.0

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2113
 URL: https://issues.apache.org/jira/browse/YARN-2113
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.6.0


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values


 [ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1156:
---

Fix Version/s: (was: 2.5.0)
   2.6.0

 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: metrics, newbie
 Fix For: 2.6.0

 Attachments: YARN-1156.1.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1142) MiniYARNCluster web ui does not work properly


 [ 
https://issues.apache.org/jira/browse/YARN-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1142:
---

Fix Version/s: (was: 2.5.0)
   2.6.0

 MiniYARNCluster web ui does not work properly
 -

 Key: YARN-1142
 URL: https://issues.apache.org/jira/browse/YARN-1142
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
 Fix For: 2.6.0


 When going to the RM http port, the NM web ui is displayed. It seems there is 
 a singleton somewhere that breaks things when RM  NMs run in the same 
 process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1327) Fix nodemgr native compilation problems on FreeBSD9


 [ 
https://issues.apache.org/jira/browse/YARN-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1327:
---

Fix Version/s: (was: 2.5.0)
   2.6.0

 Fix nodemgr native compilation problems on FreeBSD9
 ---

 Key: YARN-1327
 URL: https://issues.apache.org/jira/browse/YARN-1327
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Radim Kolar
Assignee: Radim Kolar
 Fix For: 3.0.0, 2.6.0

 Attachments: nodemgr-portability.txt


 There are several portability problems preventing from compiling native 
 component on freebsd.
 1. libgen.h is not included. correct function prototype is there but linux 
 glibc has workaround to define it for user if libgen.h is not directly 
 included. Include this file directly.
 2. query max size of login name using sysconf. it follows same code style 
 like rest of code using sysconf too.
 3. cgroups are linux only feature, make conditional compile and return error 
 if mount_cgroup is attempted on non linux OS
 4. do not use posix function setpgrp() since it clashes with same function 
 from BSD 4.2, use equivalent function. After inspecting glibc sources its 
 just shortcut to setpgid(0,0)
 These changes makes it compile on both linux and freebsd.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-745) Move UnmanagedAMLauncher to yarn client package


 [ 
https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-745:
--

Fix Version/s: (was: 2.5.0)
   2.6.0

 Move UnmanagedAMLauncher to yarn client package
 ---

 Key: YARN-745
 URL: https://issues.apache.org/jira/browse/YARN-745
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Bikas Saha
 Fix For: 2.6.0


 Its currently sitting in yarn applications project which sounds wrong. client 
 project sounds better since it contains the utilities/libraries that clients 
 use to write and debug yarn applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-965) NodeManager Metrics containersRunning is not correct When localizing container process is failed or killed


 [ 
https://issues.apache.org/jira/browse/YARN-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-965:
--

Fix Version/s: (was: 2.5.0)
   2.6.0

 NodeManager Metrics containersRunning is not correct When localizing 
 container process is failed or killed
 --

 Key: YARN-965
 URL: https://issues.apache.org/jira/browse/YARN-965
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.4-alpha
 Environment: suse linux
Reporter: Li Yuan
 Fix For: 2.6.0


 When successfully launched a container, container state from LOCALIZED to 
 RUNNING, containersRunning ++. Container state from EXITED_WITH_FAILURE or 
 KILLING to DONE, containersRunning--. 
 However, state EXITED_WITH_FAILURE or KILLING could come from 
 LOCALIZING(LOCALIZED), not RUNNING, which caused containersRunningis less 
 than the actual number. Further more,　Metrics is wrong, containersLaunched != 
 containersCompleted + containersFailed +　containersKilled ＋ containersRunning 
 +　containersIniting



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-650) User guide for preemption


 [ 
https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-650:
--

Fix Version/s: (was: 2.5.0)
   2.6.0

 User guide for preemption
 -

 Key: YARN-650
 URL: https://issues.apache.org/jira/browse/YARN-650
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Chris Douglas
Priority: Minor
 Fix For: 2.6.0

 Attachments: Y650-0.patch


 YARN-45 added a protocol for the RM to ask back resources. The docs on 
 writing YARN applications should include a section on how to interpret this 
 message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-113) WebAppProxyServlet must use SSLFactory for the HttpClient connections


 [ 
https://issues.apache.org/jira/browse/YARN-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-113:
--

Fix Version/s: (was: 2.5.0)
   2.6.0

 WebAppProxyServlet must use SSLFactory for the HttpClient connections
 -

 Key: YARN-113
 URL: https://issues.apache.org/jira/browse/YARN-113
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Fix For: 2.6.0


 The HttpClient must be configured to use the SSLFactory when the web UIs are 
 over HTTPS, otherwise the proxy servlet fails to connect to the AM because of 
 unknown (self-signed) certificates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1334) YARN should give more info on errors when running failed distributed shell command


 [ 
https://issues.apache.org/jira/browse/YARN-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1334:
---

Fix Version/s: (was: 2.5.0)
   2.6.0

 YARN should give more info on errors when running failed distributed shell 
 command
 --

 Key: YARN-1334
 URL: https://issues.apache.org/jira/browse/YARN-1334
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications/distributed-shell
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-1334.1.patch


 Run incorrect command such as:
 /usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client 
 -jar distributedshell jar -shell_command ./test1.sh -shell_script ./
 would show shell exit code exception with no useful message. It should print 
 out sysout/syserr of containers/AM of why it is failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible


 [ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2280:
---

Fix Version/s: (was: 2.5.0)
   2.6.0

 Resource manager web service fields are not accessible
 --

 Key: YARN-2280
 URL: https://issues.apache.org/jira/browse/YARN-2280
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0, 2.4.1
Reporter: Krisztian Horvath
Assignee: Krisztian Horvath
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-2280.patch


 Using the resource manager's rest api 
 (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
 rest call returns a class where the fields after the unmarshal cannot be 
 accessible. For example SchedulerTypeInfo - schedulerInfo. Using the same 
 classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1234) Container localizer logs are not created in secured cluster


 [ 
https://issues.apache.org/jira/browse/YARN-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1234:
---

Fix Version/s: (was: 2.5.0)
   2.6.0

  Container localizer logs are not created in secured cluster
 

 Key: YARN-1234
 URL: https://issues.apache.org/jira/browse/YARN-1234
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Fix For: 2.6.0


 When we are running ContainerLocalizer in secured cluster we potentially are 
 not creating any log file to track log messages. This will be helpful in 
 potentially identifying ContainerLocalization issues in secured cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


 [ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1514:
---

Fix Version/s: (was: 2.5.0)
   2.6.0

 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, 
 YARN-1514.wip-2.patch, YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-153) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS


 [ 
https://issues.apache.org/jira/browse/YARN-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-153:
--

Fix Version/s: (was: 2.5.0)
   2.6.0

 PaaS on YARN: an YARN application to demonstrate that YARN can be used as a 
 PaaS
 

 Key: YARN-153
 URL: https://issues.apache.org/jira/browse/YARN-153
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Jacob Jaigak Song
Assignee: Jacob Jaigak Song
 Fix For: 2.6.0

 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, 
 MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, 
 MAPREDUCE4393.patch

   Original Estimate: 336h
  Time Spent: 336h
  Remaining Estimate: 0h

 This application is to demonstrate that YARN can be used for non-mapreduce 
 applications. As Hadoop has already been adopted and deployed widely and its 
 deployment in future will be highly increased, we thought that it's a good 
 potential to be used as PaaS.  
 I have implemented a proof of concept to demonstrate that YARN can be used as 
 a PaaS (Platform as a Service). I have done a gap analysis against VMware's 
 Cloud Foundry and tried to achieve as many PaaS functionalities as possible 
 on YARN.
 I'd like to check in this POC as a YARN example application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality