[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251378#comment-14251378
 ] 

Hadoop QA commented on YARN-2933:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687982/YARN-2933-1.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6143//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6143//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6143//console

This message is automatically generated.

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-18 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2340:
-
Attachment: 0001-YARN-2340.patch

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251451#comment-14251451
 ] 

Hadoop QA commented on YARN-2340:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687993/0001-YARN-2340.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6144//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6144//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6144//console

This message is automatically generated.

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-18 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251455#comment-14251455
 ] 

Rohith commented on YARN-2340:
--

It looks failed tests is random. In my env, it is running successfully.

 NPE thrown when RM restart after queue is STOPPED. There after RM can not 
 recovery application's and remain in standby
 --

 Key: YARN-2340
 URL: https://issues.apache.org/jira/browse/YARN-2340
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.4.1
 Environment: Capacityscheduler with Queue a, b
Reporter: Nishan Shetty
Assignee: Rohith
Priority: Critical
 Attachments: 0001-YARN-2340.patch


 While job is in progress make Queue  state as STOPPED and then restart RM 
 Observe that standby RM fails to come up as acive throwing below NPE
 2014-07-23 18:43:24,432 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
 2014-07-23 18:43:24,433 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
  at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
  at java.lang.Thread.run(Thread.java:662)
 2014-07-23 18:43:24,434 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251462#comment-14251462
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


 Admin service for cache manager
 ---

 Key: YARN-2189
 URL: https://issues.apache.org/jira/browse/YARN-2189
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-1492-trunk-addendum.patch, 
 YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
 YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
 YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch


 Implement the admin service for the shared cache manager. This service is 
 responsible for handling administrative commands such as manually running a 
 cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251466#comment-14251466
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251467#comment-14251467
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


 DelegationTokenRenewer thread pool never expands
 

 Key: YARN-2972
 URL: https://issues.apache.org/jira/browse/YARN-2972
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.7.0

 Attachments: YARN-2972.001.patch


 DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
 number of threads is configurable, but unfortunately the pool never expands 
 beyond the hardcoded initial 5 threads because we are using an unbounded 
 LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
 the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251463#comment-14251463
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251471#comment-14251471
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java


 InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
 -

 Key: YARN-2944
 URL: https://issues.apache.org/jira/browse/YARN-2944
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
 YARN-2944-trunk-v3.patch


 Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
 the SCMStore service. Unfortunately the SCMStore class does not have a 
 0-argument constructor.
 On startup, the SCM fails with the following:
 {noformat}
 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
 failed in state INITED; cause: java.lang.RuntimeException: 
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at java.lang.Class.getConstructor0(Class.java:2763)
 at java.lang.Class.getDeclaredConstructor(Class.java:2021)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
 ... 4 more
 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
 SharedCacheManager
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 

[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251476#comment-14251476
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java


 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251482#comment-14251482
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt


 DelegationTokenRenewer thread pool never expands
 

 Key: YARN-2972
 URL: https://issues.apache.org/jira/browse/YARN-2972
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.7.0

 Attachments: YARN-2972.001.patch


 DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
 number of threads is configurable, but unfortunately the pool never expands 
 beyond the hardcoded initial 5 threads because we are using an unbounded 
 LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
 the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251478#comment-14251478
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251490#comment-14251490
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251486#comment-14251486
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java


 InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
 -

 Key: YARN-2944
 URL: https://issues.apache.org/jira/browse/YARN-2944
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
 YARN-2944-trunk-v3.patch


 Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
 the SCMStore service. Unfortunately the SCMStore class does not have a 
 0-argument constructor.
 On startup, the SCM fails with the following:
 {noformat}
 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
 failed in state INITED; cause: java.lang.RuntimeException: 
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at java.lang.Class.getConstructor0(Class.java:2763)
 at java.lang.Class.getDeclaredConstructor(Class.java:2021)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
 ... 4 more
 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
 SharedCacheManager
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251477#comment-14251477
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


 Admin service for cache manager
 ---

 Key: YARN-2189
 URL: https://issues.apache.org/jira/browse/YARN-2189
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-1492-trunk-addendum.patch, 
 YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
 YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
 YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch


 Implement the admin service for the shared cache manager. This service is 
 responsible for handling administrative commands such as manually running a 
 cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251481#comment-14251481
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251660#comment-14251660
 ] 

Hadoop QA commented on YARN-2949:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12688030/apache-yarn-2949.1.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6145//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6145//console

This message is automatically generated.

 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2949:

Attachment: (was: NodeManagerCgroups.html)

 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2949.0.patch, apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2949:

Attachment: NodeManagerCgroups.html

Uploaded the html generated by the latest patch.

 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251670#comment-14251670
 ] 

Junping Du commented on YARN-2949:
--

+1. v2 patch LGTM. Will commit it shortly.

 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251682#comment-14251682
 ] 

Hudson commented on YARN-2949:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6746 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6746/])
YARN-2949. Add documentation for CGroups. (Contributed by Varun Vasudev) 
(junping_du: rev 389f881d423c1f7c2bb90ff521e59eb8c7d26214)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerCgroups.apt.vm
* hadoop-project/src/site/site.xml
* hadoop-yarn-project/CHANGES.txt


 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251697#comment-14251697
 ] 

Hadoop QA commented on YARN-2949:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12688030/apache-yarn-2949.1.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6146//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6146//console

This message is automatically generated.

 Add documentation for CGroups
 -

 Key: YARN-2949
 URL: https://issues.apache.org/jira/browse/YARN-2949
 Project: Hadoop YARN
  Issue Type: Task
  Components: documentation, nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.7.0

 Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
 apache-yarn-2949.1.patch


 A bunch of changes have gone into the NodeManager to allow greater use of 
 CGroups. It would be good to have a single page that documents how to setup 
 CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251704#comment-14251704
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


 Admin service for cache manager
 ---

 Key: YARN-2189
 URL: https://issues.apache.org/jira/browse/YARN-2189
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-1492-trunk-addendum.patch, 
 YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
 YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
 YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch


 Implement the admin service for the shared cache manager. This service is 
 responsible for handling administrative commands such as manually running a 
 cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251705#comment-14251705
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251708#comment-14251708
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251717#comment-14251717
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251723#comment-14251723
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


 DelegationTokenRenewer thread pool never expands
 

 Key: YARN-2972
 URL: https://issues.apache.org/jira/browse/YARN-2972
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.7.0

 Attachments: YARN-2972.001.patch


 DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
 number of threads is configurable, but unfortunately the pool never expands 
 beyond the hardcoded initial 5 threads because we are using an unbounded 
 LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
 the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251719#comment-14251719
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251709#comment-14251709
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt


 DelegationTokenRenewer thread pool never expands
 

 Key: YARN-2972
 URL: https://issues.apache.org/jira/browse/YARN-2972
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.7.0

 Attachments: YARN-2972.001.patch


 DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
 number of threads is configurable, but unfortunately the pool never expands 
 beyond the hardcoded initial 5 threads because we are using an unbounded 
 LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
 the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251722#comment-14251722
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251727#comment-14251727
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java


 InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
 -

 Key: YARN-2944
 URL: https://issues.apache.org/jira/browse/YARN-2944
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
 YARN-2944-trunk-v3.patch


 Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
 the SCMStore service. Unfortunately the SCMStore class does not have a 
 0-argument constructor.
 On startup, the SCM fails with the following:
 {noformat}
 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
 failed in state INITED; cause: java.lang.RuntimeException: 
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at java.lang.Class.getConstructor0(Class.java:2763)
 at java.lang.Class.getDeclaredConstructor(Class.java:2021)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
 ... 4 more
 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
 SharedCacheManager
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 

[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251713#comment-14251713
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* hadoop-yarn-project/CHANGES.txt


 InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
 -

 Key: YARN-2944
 URL: https://issues.apache.org/jira/browse/YARN-2944
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
 YARN-2944-trunk-v3.patch


 Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
 the SCMStore service. Unfortunately the SCMStore class does not have a 
 0-argument constructor.
 On startup, the SCM fails with the following:
 {noformat}
 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
 failed in state INITED; cause: java.lang.RuntimeException: 
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at java.lang.Class.getConstructor0(Class.java:2763)
 at java.lang.Class.getDeclaredConstructor(Class.java:2021)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
 ... 4 more
 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
 SharedCacheManager
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251718#comment-14251718
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/bin/yarn
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh


 Admin service for cache manager
 ---

 Key: YARN-2189
 URL: https://issues.apache.org/jira/browse/YARN-2189
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-1492-trunk-addendum.patch, 
 YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
 YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
 YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch


 Implement the admin service for the shared cache manager. This service is 
 responsible for handling administrative commands such as manually running a 
 cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251731#comment-14251731
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java


 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251763#comment-14251763
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251771#comment-14251771
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java


 InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
 -

 Key: YARN-2944
 URL: https://issues.apache.org/jira/browse/YARN-2944
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
 YARN-2944-trunk-v3.patch


 Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
 the SCMStore service. Unfortunately the SCMStore class does not have a 
 0-argument constructor.
 On startup, the SCM fails with the following:
 {noformat}
 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
 failed in state INITED; cause: java.lang.RuntimeException: 
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at java.lang.Class.getConstructor0(Class.java:2763)
 at java.lang.Class.getDeclaredConstructor(Class.java:2021)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
 ... 4 more
 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
 SharedCacheManager
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 

[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251767#comment-14251767
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


 DelegationTokenRenewer thread pool never expands
 

 Key: YARN-2972
 URL: https://issues.apache.org/jira/browse/YARN-2972
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.7.0

 Attachments: YARN-2972.001.patch


 DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
 number of threads is configurable, but unfortunately the pool never expands 
 beyond the hardcoded initial 5 threads because we are using an unbounded 
 LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
 the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251766#comment-14251766
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251762#comment-14251762
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


 Admin service for cache manager
 ---

 Key: YARN-2189
 URL: https://issues.apache.org/jira/browse/YARN-2189
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-1492-trunk-addendum.patch, 
 YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
 YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
 YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch


 Implement the admin service for the shared cache manager. This service is 
 responsible for handling administrative commands such as manually running a 
 cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251775#comment-14251775
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java


 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251801#comment-14251801
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java


 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
Priority: Critical
 Attachments: YARN-1492-all-trunk-v1.patch, 
 YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
 YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
 shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf, shared_cache_design_v6.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251813#comment-14251813
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java


 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251805#comment-14251805
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt


 DelegationTokenRenewer thread pool never expands
 

 Key: YARN-2972
 URL: https://issues.apache.org/jira/browse/YARN-2972
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 2.7.0

 Attachments: YARN-2972.001.patch


 DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
 number of threads is configurable, but unfortunately the pool never expands 
 beyond the hardcoded initial 5 threads because we are using an unbounded 
 LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
 the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251809#comment-14251809
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java


 InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
 -

 Key: YARN-2944
 URL: https://issues.apache.org/jira/browse/YARN-2944
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
 YARN-2944-trunk-v3.patch


 Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
 the SCMStore service. Unfortunately the SCMStore class does not have a 
 0-argument constructor.
 On startup, the SCM fails with the following:
 {noformat}
 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
 failed in state INITED; cause: java.lang.RuntimeException: 
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at java.lang.Class.getConstructor0(Class.java:2763)
 at java.lang.Class.getDeclaredConstructor(Class.java:2021)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
 ... 4 more
 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
 SharedCacheManager
 java.lang.RuntimeException: java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.init()
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
 at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at 
 org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
 Caused by: java.lang.NoSuchMethodException: 
 

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251800#comment-14251800
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


 Admin service for cache manager
 ---

 Key: YARN-2189
 URL: https://issues.apache.org/jira/browse/YARN-2189
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: YARN-1492-trunk-addendum.patch, 
 YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
 YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
 YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch


 Implement the admin service for the shared cache manager. This service is 
 responsible for handling administrative commands such as manually running a 
 cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251804#comment-14251804
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251818#comment-14251818
 ] 

Jason Lowe commented on YARN-2964:
--

Thanks for the patch, Jian!  Findbug warnings appear to be unrelated.

I'm wondering about the change in the removeApplicationFromRenewal method or 
remove.  If a sub-job completes, won't we remove the token from the allTokens 
map before the launcher job has completed?  Then a subsequent sub-job that 
requests token cancelation can put the token back in the map and cause the 
token to be canceled when it leaves.  I think we need to repeat the logic from 
the original code before YARN-2704 here, i.e.: only remove the token if the 
application ID matches.  That way the launcher job's token will remain _the_ 
token in that collection until the launcher job completes.

This comment doesn't match the code, since the code looks like if any token 
wants to cancel at the end then we will cancel at the end.
{code}
  // If any of the jobs sharing the same token set shouldCancelAtEnd
  // to true, we should not cancel the token.
  if (evt.shouldCancelAtEnd) {
dttr.shouldCancelAtEnd = evt.shouldCancelAtEnd;
  }
{code}
I think the logic and comment should be if any job doesn't want to cancel then 
we won't cancel.  The code seems to be trying to do the opposite, so I'm not 
sure how the unit test is passing.  Maybe I'm missing something.

The info log message added in handleAppSubmitEvent also is misleading, as it 
says we are setting shouldCancelAtEnd to whatever the event said, when in 
reality we only set it sometimes.  Probably needs to be inside the conditional.

Wonder if we should be using a Set instead of a Map to track these tokens.  
Adding an already existing DelegationTokenToRenew in a set will not change the 
one already there, but with the map a sub-job can clobber the 
DelegationTokenToRenew that's already there with its own when it does the 
allTokens.put(dtr.token, dtr).

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251854#comment-14251854
 ] 

Allen Wittenauer commented on YARN-2203:


Should this have been .http.address instead of webapp.address to be consistent 
with the rest of Hadoop?

 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2014-12-18 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2936:
---
Attachment: YARN-2936.001.patch

 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Attachments: YARN-2936.001.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2868) Add metric for initial container launch time

2014-12-18 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Attachment: YARN-2868.005.patch

- Cleaned up unused imports
- Didn't move @metric to a single line--that violates the 80 column width


 Add metric for initial container launch time
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252003#comment-14252003
 ] 

Chris Trezzo commented on YARN-2203:


[~aw] Thanks for the comment! I didn't realize that was a convention 
considering there are other similar parameters that do not seem to follow that 
form. For some examples: yarn.resourcemanager.webapp.address, 
yarn.resourcemanager.admin.address, yarn.nodemanager.webapp.address. I can make 
note of your comment in YARN-2654 when we do a final pass on the config 
parameters and ensure that they have quality names.

 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2654) Revisit all shared cache config parameters to ensure quality names

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252007#comment-14252007
 ] 

Karthik Kambatla commented on YARN-2654:


From [~aw] on YARN-2203: Should this have been .http.address instead of 
webapp.address to be consistent with the rest of Hadoop?

 Revisit all shared cache config parameters to ensure quality names
 --

 Key: YARN-2654
 URL: https://issues.apache.org/jira/browse/YARN-2654
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Blocker

 Revisit all the shared cache config parameters in YarnConfiguration and 
 yarn-default.xml to ensure quality names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252008#comment-14252008
 ] 

Karthik Kambatla commented on YARN-2203:


Added a comment there. Thanks Allen. 

 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252031#comment-14252031
 ] 

Allen Wittenauer commented on YARN-2203:


I wouldn't trust the YARN project to be any sort of guide to do anything with 
consistency.  It has proven over and over again that it wants everything to be 
hard to administer.

 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252045#comment-14252045
 ] 

Jian He commented on YARN-2964:
---

thanks for your comments, Jason !

bq. I'm wondering about the change in the removeApplicationFromRenewal method 
or remove.
If launcher job first gets added to the appTokens map, DelegationTokenRenewer 
will not add DelegationTokenToRenew instance for the sub-job. So the tokens in 
removeApplicationFromRenewal will return empty for the sub-job when the sub-job 
completes. So the token won’t be removed from the allTokens. My only concern 
with a global set that is that each time an application completes, we end up 
looping all the applications or worse (each app may have at least one token).
bq. This comment doesn't match the code
good catch.. what a mistake.. I might be in the impression the semantics is 
“shouldKeepAtEnd”, I added one line in the test case to guard against this.
bq. Wonder if we should be using a Set instead of a Map to track these tokens
Thought about that too, the reason that switched to a map is to get the 
DelegationTokenToRenew instance based on the token app provided and change the 
shouldCancelAtEnd field on submission.

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2964:
--
Attachment: YARN-2964.2.patch

updated the patch based on some comments from Jason

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252078#comment-14252078
 ] 

Karthik Kambatla commented on YARN-2203:


bq. I wouldn't trust the YARN project to be any sort of guide to do anything 
with consistency. It has proven over and over again that it wants everything to 
be hard to administer.

If this comment was to provide direction, I am clearly missing it. Is the 
suggestion to drop whatever consistency the *sub*-project is *trying* to 
achieve and run in different directions? 

 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252103#comment-14252103
 ] 

Allen Wittenauer commented on YARN-2203:


Consistency within the larger project is more important than consistency within 
the sub-project.   Operationally, it sucks having to switch gears depending 
upon which sub-project one is working on. Oh, YARN calls it feature X, even 
though the rest of Hadoop has called it Y since pre-YARN.  

 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252116#comment-14252116
 ] 

Karthik Kambatla commented on YARN-2203:


I see the value in the entire project being consistent, but not at the expense 
of inconsistency within the sub-project. I would like to stay out of how we 
should have done things.

If it is true that there are inconsistencies with no good reason, it would be 
nice to address those inconsistencies in Hadoop-3, preferably in a 
backwards-compatible way. Any volunteers? 

 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252139#comment-14252139
 ] 

Hadoop QA commented on YARN-2936:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688081/YARN-2936.001.patch
  against trunk revision 389f881.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 39 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6147//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6147//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6147//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6147//console

This message is automatically generated.

 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Attachments: YARN-2936.001.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) Add metric for initial container launch time

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252164#comment-14252164
 ] 

Hadoop QA commented on YARN-2868:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688084/YARN-2868.005.patch
  against trunk revision 389f881.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6148//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6148//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6148//console

This message is automatically generated.

 Add metric for initial container launch time
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
 YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252195#comment-14252195
 ] 

Karthik Kambatla commented on YARN-2975:


bq. The removeApp from one operation is now 2 steps.
removeApp did not expose the two operations outside, but had them. 

IIUC, an app can only go from non-runnable to runnable, but not vice-versa. For 
instance, see the following snippet in FairScheduler#executeMove. So, don't 
think we need to worry about consistency. 
{code}
if (wasRunnable  !nowRunnable) {
  throw new IllegalStateException(Should have already verified that app 
  + attempt.getApplicationId() +  would be runnable in new queue);
}
{code}

{quote}
Nit: 
suggest resetPreemptedResources - resetPreemptedResourcesRunnableApps
clearPreemptedResources - clear PreemptedResourcesRunnableApps
{quote}
I thought of this, but decided against it. The additional RunnableApps doesn't 
add anything but extra characters. 

 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2975-1.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252205#comment-14252205
 ] 

Karthik Kambatla commented on YARN-2975:


[~adhoot] - I see your point about consistency. If the app was not runnable at 
the time of check but becomes runnable by the time we remove it, we can miss 
untracking it. May be, we should clean up how we store runnability of an app in 
FSAppAttempt, and add transitions like rest of RM code to handle addition, 
move, removal of the apps. The scope might be too big to do in this JIRA. Okay 
with doing it in a follow-up? 

By the way, thanks a bunch for taking the time to review. 

 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2975-1.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2014-12-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252213#comment-14252213
 ] 

Varun Saxena commented on YARN-2936:


[~zjshen], [~jianhe], kindly review.

Findbugs to be handled by YARN-2937 to YARN-2940.
Test failure unrelated and all these tests are passing in my local setup.

 YARNDelegationTokenIdentifier doesn't set proto.builder now
 ---

 Key: YARN-2936
 URL: https://issues.apache.org/jira/browse/YARN-2936
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Varun Saxena
 Attachments: YARN-2936.001.patch


 After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
 such that when constructing a object which extends 
 YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
 when we call getProto() of it, we will just get an empty proto object.
 It seems to do no harm to the production code path, as we will always call 
 getBytes() before using proto to persist the DT in the state store, when we 
 generating the password.
 I think the setter is removed to avoid duplicating setting the fields why 
 getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
 properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
 logic in secretManager. It's vulnerable if something is changed at 
 secretManager. For example, in the test case of YARN-2837, I spent time to 
 figure out we need to execute getBytes() first to make sure the testing DTs 
 can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252216#comment-14252216
 ] 

Hadoop QA commented on YARN-2964:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688092/YARN-2964.2.patch
  against trunk revision 07619aa.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6149//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6149//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6149//console

This message is automatically generated.

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252218#comment-14252218
 ] 

Jason Lowe commented on YARN-2964:
--

bq. If launcher job first gets added to the appTokens map, 
DelegationTokenRenewer will not add DelegationTokenToRenew instance for the 
sub-job.

Ah, sorry, I missed this critical change from the original patch.  However if 
we don't add the delegation token for each sub-job then I think we have a 
problem with the following use-case:

# Oozie launcher submits a MapReduce sub-job
# MapReduce job starts
# Oozie launcher job leaves
# MapReduce job now running with a token that the RM has forgotten and won't 
be automatically renewed

We might have had the same issue in this case prior to YARN-2704, since the 
token would be pulled from the set when the launcher completed.

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252243#comment-14252243
 ] 

Jian He commented on YARN-2964:
---

bq. We might have had the same issue in this case prior to YARN-2704.
Yes, this is an existing issue. As Robert pointed out in the previous comment, 
oozie MapReduce sub-job now cannot run beyond 24 hrs. IMO, we can fix this 
separately ?

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252259#comment-14252259
 ] 

Jason Lowe commented on YARN-2964:
--

Sure, we can fix that as a followup issue since it's no worse than what we had 
before.

+1 lgtm, only nit is the new getAllTokens method should be package-private 
instead of public but not a big deal either way.  I assume the test failures 
are unrelated?

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252286#comment-14252286
 ] 

Jian He commented on YARN-2964:
---

I believe the failures are not related.  I just changed the visibility and 
uploaded a new patch to re-kick jenkins.

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2964:
--
Attachment: YARN-2964.3.patch

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252332#comment-14252332
 ] 

Jian He commented on YARN-2939:
---

+1

 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-18 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252343#comment-14252343
 ] 

Anubhav Dhoot commented on YARN-2975:
-

Yes I am worried about getting it wrong for maxRunningEnforcer. Before the 
change, we would inside a lock achieve the removal of the app whether it was in 
runnable or not and be reasonably sure.
Now the splitting it into 2 non atomic steps outside as i listed above,  and 
also 2 steps inside {noformat} return removeRunnableApp(app) || 
removeNonRunnableApp(app) {noformat}, we might make it worse as each one leaves 
the lock before the other acquires. The application could be completely missed 
when it moves from nonrunnable to runnable in between.
How about making removeApp do try to remove from both runnable or nonRunnable 
inside a single writelock. We can try removing the redundancies with 
removeRunnableApp and removeNonRunnableApp by having a fourth internal method 
that all 3 delegate via flags to limit where to look for the app.  

 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2975-1.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252467#comment-14252467
 ] 

Karthik Kambatla commented on YARN-2975:


I understand the concern. Pre-YARN-2910, we were in the exact same place and 
haven't seen any issues in practice. Let me take a look again and see if we can 
do what you are suggesting - do all of removeApp while holding the writeLock. 
Initially, I had a single method, but had to split based on prior accesses. 





 FSLeafQueue app lists are accessed without required locks
 -

 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-2975-1.patch


 YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
 FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
 without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2978) Null pointer in YarnProtos

2014-12-18 Thread Jason Tufo (JIRA)
Jason Tufo created YARN-2978:


 Summary: Null pointer in YarnProtos
 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo


 java.lang.NullPointerException
at 
org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
at 
org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252473#comment-14252473
 ] 

Hadoop QA commented on YARN-2964:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688133/YARN-2964.3.patch
  against trunk revision b9d4976.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6150//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6150//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6150//console

This message is automatically generated.

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.

2014-12-18 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252481#comment-14252481
 ] 

Ray Chiang commented on YARN-2675:
--

Started reviewing, but it looks like this patch needs to be updated due to 
YARN-1156.

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, 
 YARN-2675.005.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2979) Unsupported operation exception in message building (YarnProtos)

2014-12-18 Thread Jason Tufo (JIRA)
Jason Tufo created YARN-2979:


 Summary: Unsupported operation exception in message building 
(YarnProtos)
 Key: YARN-2979
 URL: https://issues.apache.org/jira/browse/YARN-2979
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.5.1
Reporter: Jason Tufo


java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.addAllApplications(YarnProtos.java:30702)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.addApplicationsToProto(QueueInfoPBImpl.java:227)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToBuilder(QueueInfoPBImpl.java:282)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:289)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252507#comment-14252507
 ] 

Allen Wittenauer commented on YARN-2203:


I'm willing to do it, but it won't be backward compatible.

 Web UI for cache manager
 

 Key: YARN-2203
 URL: https://issues.apache.org/jira/browse/YARN-2203
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Fix For: 2.7.0

 Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
 YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
 YARN-2203-trunk-v5.patch


 Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252509#comment-14252509
 ] 

Hadoop QA commented on YARN-2939:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687590/YARN-2939-121614.patch
  against trunk revision c4d9713.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6151//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6151//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6151//console

This message is automatically generated.

 Fix new findbugs warnings in hadoop-yarn-common
 ---

 Key: YARN-2939
 URL: https://issues.apache.org/jira/browse/YARN-2939
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Saxena
Assignee: Li Lu
  Labels: findbugs
 Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252510#comment-14252510
 ] 

Jason Lowe commented on YARN-2964:
--

+1 lgtm.  I don't believe the test failures are related since they pass for me 
locally.  Committing this.


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-12-18 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat updated YARN-2230:
-
Component/s: documentation

 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Priority: Minor
 Attachments: YARN-2230.001.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-12-18 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat updated YARN-2230:
-
Attachment: YARN-2230.001.patch

 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Priority: Minor
 Attachments: YARN-2230.001.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-12-18 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat updated YARN-2230:
-
Attachment: (was: YARN-2230.001.patch)

 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Priority: Minor
 Attachments: YARN-2230.001.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
 {code}
 * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
 obvious to discover why a job does not make any progress.
 The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252563#comment-14252563
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6755 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6755/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie). 
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252592#comment-14252592
 ] 

Hadoop QA commented on YARN-2230:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688178/YARN-2230.001.patch
  against trunk revision 0402bad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 25 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6152//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6152//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6152//console

This message is automatically generated.

 Fix description of yarn.scheduler.maximum-allocation-vcores in 
 yarn-default.xml (or code)
 -

 Key: YARN-2230
 URL: https://issues.apache.org/jira/browse/YARN-2230
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, documentation, scheduler
Affects Versions: 2.4.0
Reporter: Adam Kawa
Priority: Minor
 Attachments: YARN-2230.001.patch


 When a user requests more vcores than the allocation limit (e.g. 
 mapreduce.map.cpu.vcores  is larger than 
 yarn.scheduler.maximum-allocation-vcores), then 
 InvalidResourceRequestException is thrown - 
 https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
 {code}
 if (resReq.getCapability().getVirtualCores()  0 ||
 resReq.getCapability().getVirtualCores() 
 maximumResource.getVirtualCores()) {
   throw new InvalidResourceRequestException(Invalid resource request
   + , requested virtual cores  0
   + , or requested virtual cores  max configured
   + , requestedVirtualCores=
   + resReq.getCapability().getVirtualCores()
   + , maxVirtualCores= + maximumResource.getVirtualCores());
 }
 {code}
 According to documentation - yarn-default.xml 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
  the request should be capped to the allocation limit.
 {code}
   property
 descriptionThe maximum allocation for every container request at the RM,
 in terms of virtual CPU cores. Requests higher than this won't take 
 effect,
 and will get capped to this value./description
 nameyarn.scheduler.maximum-allocation-vcores/name
 value32/value
   /property
 {code}
 This means that:
 * Either documentation or code should be corrected (unless this exception is 
 handled elsewhere accordingly, but it looks that it is not).
 This behavior is confusing, because when such a job (with 
 mapreduce.map.cpu.vcores is larger than 
 yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
 progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
 {code}
 2014-06-29 00:34:51,469 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1403993411503_0002_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, requested virtual cores  0, or requested virtual cores  
 max configured, requestedVirtualCores=32, maxVirtualCores=3
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
 .
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at 

[jira] [Updated] (YARN-2217) Shared cache client side changes

2014-12-18 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2217:
---
Attachment: YARN-2217-trunk-v4.patch

[~kasha] V4 attached. Patch verified manually.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252611#comment-14252611
 ] 

Jian He commented on YARN-2964:
---

thanks for reviewing and committing, Jason !

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-12-18 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.19.patch

Modified patch to use minimum allocation value if application master resource 
is unavailable

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2654) Revisit all shared cache config parameters to ensure quality names

2014-12-18 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2654:
---
Attachment: shared_cache_config_parameters.txt

Attached is a text file with all of the shared cache config parameters, their 
descriptions and defaults (taken from yarn-default.xml).

 Revisit all shared cache config parameters to ensure quality names
 --

 Key: YARN-2654
 URL: https://issues.apache.org/jira/browse/YARN-2654
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Priority: Blocker
 Attachments: shared_cache_config_parameters.txt


 Revisit all the shared cache config parameters in YarnConfiguration and 
 yarn-default.xml to ensure quality names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-12-18 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252620#comment-14252620
 ] 

Craig Welch commented on YARN-2637:
---

Minor update - TestAllocationFileLoaderService passes for me, I think it's a 
build server issue.  Also, I believe the findbugs is still related to the jdk7 
update - but they had disappeared before I had a chance to verify, they will 
re-run with the new patch and I will confirm that they are not related to my 
change.   In all, I believe the change is fine in terms of the release audit 
checks...

The only change in this version vs the last is with respect to:

bq. 2. FiCaSchedulerApp constructor

As I said before, this is present in non-test test scenarios.  However, I 
realized that I could use the minimum allocation from the scheduler if it is 
not present, which would mean at worse we would have the old behavior if 
there is not an actual amresource to work with - so I adjusted the code to do 
that if necessary  possible




 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-18 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252629#comment-14252629
 ] 

Mayank Bansal commented on YARN-2933:
-

These find bug and test failure is not due to this patch.

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252682#comment-14252682
 ] 

Hadoop QA commented on YARN-2217:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12688186/YARN-2217-trunk-v4.patch
  against trunk revision 0402bad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  org.apache.hadoop.yarn.client.api.impl.TestYarnClient

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6153//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6153//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6153//console

This message is automatically generated.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2014-12-18 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252700#comment-14252700
 ] 

Chris Trezzo commented on YARN-2217:


Findbugs and test failures seem unrelated.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
 YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252753#comment-14252753
 ] 

Hadoop QA commented on YARN-2637:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688189/YARN-2637.19.patch
  against trunk revision 0402bad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6154//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6154//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6154//console

This message is automatically generated.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
 YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
 YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
 YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2978) Null pointer in YarnProtos

2014-12-18 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2978:
--

Assignee: Varun Saxena

 Null pointer in YarnProtos
 --

 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena

  java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2979) Unsupported operation exception in message building (YarnProtos)

2014-12-18 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2979:
--

Assignee: Varun Saxena

 Unsupported operation exception in message building (YarnProtos)
 

 Key: YARN-2979
 URL: https://issues.apache.org/jira/browse/YARN-2979
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Varun Saxena

 java.lang.UnsupportedOperationException
   at java.util.AbstractList.add(AbstractList.java:148)
   at java.util.AbstractList.add(AbstractList.java:108)
   at 
 com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
   at 
 org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.addAllApplications(YarnProtos.java:30702)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.addApplicationsToProto(QueueInfoPBImpl.java:227)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToBuilder(QueueInfoPBImpl.java:282)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:289)
   at 
 org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
   at 
 org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2980) Move health check script related functionality to hadoop-common

2014-12-18 Thread Ming Ma (JIRA)
Ming Ma created YARN-2980:
-

 Summary: Move health check script related functionality to 
hadoop-common
 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma


HDFS might want to leverage health check functionality available in YARN in 
both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
https://issues.apache.org/jira/browse/HDFS-7441.

We can move health check functionality including the protocol between hadoop 
daemons and health check script to hadoop-common. That will simplify the 
development and maintenance for both hadoop source code and health check script.

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252830#comment-14252830
 ] 

Robert Kanter commented on YARN-2964:
-

Thanks for fixing this.

[~jianhe], [~jlowe], on the 24 hrs thing, do you think this is something we 
can/should fix in YARN?  My understanding of this issue is that it's by design 
(there's even a config for the interval).  Given that, I'm thinking the proper 
fix for this is just to have the launcher job periodically renew the token (a 
fix in OOZIE)?

 RM prematurely cancels tokens for jobs that submit jobs (oozie)
 ---

 Key: YARN-2964
 URL: https://issues.apache.org/jira/browse/YARN-2964
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Jian He
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch


 The RM used to globally track the unique set of tokens for all apps.  It 
 remembered the first job that was submitted with the token.  The first job 
 controlled the cancellation of the token.  This prevented completion of 
 sub-jobs from canceling tokens used by the main job.
 As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
 notion of the first/main job.  This results in sub-jobs canceling tokens and 
 failing the main job and other sub-jobs.  It also appears to schedule 
 multiple redundant renewals.
 The issue is not immediately obvious because the RM will cancel tokens ~10 
 min (NM livelyness interval) after log aggregation completes.  The result is 
 an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
 any sub-jobs are launched 10 min after any sub-job completes.  If all other 
 sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2980) Move health check script related functionality to hadoop-common

2014-12-18 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2980:
--

Assignee: Varun Saxena

 Move health check script related functionality to hadoop-common
 ---

 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Varun Saxena

 HDFS might want to leverage health check functionality available in YARN in 
 both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
 https://issues.apache.org/jira/browse/HDFS-7441.
 We can move health check functionality including the protocol between hadoop 
 daemons and health check script to hadoop-common. That will simplify the 
 development and maintenance for both hadoop source code and health check 
 script.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.

2014-12-18 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2675:

Attachment: YARN-2675.006.patch

 the containersKilled metrics is not updated when the container is killed 
 during localization.
 -

 Key: YARN-2675
 URL: https://issues.apache.org/jira/browse/YARN-2675
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
  Labels: metrics, supportability
 Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
 YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, 
 YARN-2675.005.patch, YARN-2675.006.patch


 The containersKilled metrics is not updated when the container is killed 
 during localization. We should add KILLING state in finished of 
 ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >