[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251378#comment-14251378
 ] 

Hadoop QA commented on YARN-2933:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687982/YARN-2933-1.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6143//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6143//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6143//console

This message is automatically generated.

> Capacity Scheduler preemption policy should only consider capacity without 
> labels temporarily
> -
>
> Key: YARN-2933
> URL: https://issues.apache.org/jira/browse/YARN-2933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Mayank Bansal
> Attachments: YARN-2933-1.patch
>
>
> Currently, we have capacity enforcement on each queue for each label in 
> CapacityScheduler, but we don't have preemption policy to support that. 
> YARN-2498 is targeting to support preemption respect node labels, but we have 
> some gaps in code base, like queues/FiCaScheduler should be able to get 
> usedResource/pendingResource, etc. by label. These items potentially need to 
> refactor CS which we need spend some time carefully think about.
> For now, what immediately we can do is allow calculate ideal_allocation and 
> preempt containers only for resources on nodes without labels, to avoid 
> regression like: A cluster has some nodes with labels and some not, assume 
> queueA isn't satisfied for resource without label, but for now, preemption 
> policy may preempt resource from nodes with labels for queueA, that is not 
> correct.
> Again, it is just a short-term enhancement, YARN-2498 will consider 
> preemption respecting node-labels for Capacity Scheduler which is our final 
> target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-18 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2340:
-
Attachment: 0001-YARN-2340.patch

> NPE thrown when RM restart after queue is STOPPED. There after RM can not 
> recovery application's and remain in standby
> --
>
> Key: YARN-2340
> URL: https://issues.apache.org/jira/browse/YARN-2340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.4.1
> Environment: Capacityscheduler with Queue a, b
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-2340.patch
>
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-18 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251393#comment-14251393
 ] 

Rohith commented on YARN-2340:
--

Thanks [~jianhe] for your suggestion. I attached the patch fixing the issue for 
CapacityScheduler. I verified in deploying in real cluster also, it is working 
fine.
Kindly review the patch.

> NPE thrown when RM restart after queue is STOPPED. There after RM can not 
> recovery application's and remain in standby
> --
>
> Key: YARN-2340
> URL: https://issues.apache.org/jira/browse/YARN-2340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.4.1
> Environment: Capacityscheduler with Queue a, b
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-2340.patch
>
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251451#comment-14251451
 ] 

Hadoop QA commented on YARN-2340:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687993/0001-YARN-2340.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6144//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6144//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6144//console

This message is automatically generated.

> NPE thrown when RM restart after queue is STOPPED. There after RM can not 
> recovery application's and remain in standby
> --
>
> Key: YARN-2340
> URL: https://issues.apache.org/jira/browse/YARN-2340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.4.1
> Environment: Capacityscheduler with Queue a, b
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-2340.patch
>
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2340) NPE thrown when RM restart after queue is STOPPED. There after RM can not recovery application's and remain in standby

2014-12-18 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251455#comment-14251455
 ] 

Rohith commented on YARN-2340:
--

It looks failed tests is random. In my env, it is running successfully.

> NPE thrown when RM restart after queue is STOPPED. There after RM can not 
> recovery application's and remain in standby
> --
>
> Key: YARN-2340
> URL: https://issues.apache.org/jira/browse/YARN-2340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.4.1
> Environment: Capacityscheduler with Queue a, b
>Reporter: Nishan Shetty
>Assignee: Rohith
>Priority: Critical
> Attachments: 0001-YARN-2340.patch
>
>
> While job is in progress make Queue  state as STOPPED and then restart RM 
> Observe that standby RM fails to come up as acive throwing below NPE
> 2014-07-23 18:43:24,432 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> appattempt_1406116264351_0014_02 State change from NEW to SUBMITTED
> 2014-07-23 18:43:24,433 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_ADDED to the scheduler
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:568)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:916)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:101)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:602)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 18:43:24,434 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251462#comment-14251462
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251466#comment-14251466
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251467#comment-14251467
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


> DelegationTokenRenewer thread pool never expands
> 
>
> Key: YARN-2972
> URL: https://issues.apache.org/jira/browse/YARN-2972
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.0
>
> Attachments: YARN-2972.001.patch
>
>
> DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
> number of threads is configurable, but unfortunately the pool never expands 
> beyond the hardcoded initial 5 threads because we are using an unbounded 
> LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
> the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251463#comment-14251463
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251471#comment-14251471
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java


> InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
> -
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
>

[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251476#comment-14251476
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/45/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java


> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251482#comment-14251482
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt


> DelegationTokenRenewer thread pool never expands
> 
>
> Key: YARN-2972
> URL: https://issues.apache.org/jira/browse/YARN-2972
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.0
>
> Attachments: YARN-2972.001.patch
>
>
> DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
> number of threads is configurable, but unfortunately the pool never expands 
> beyond the hardcoded initial 5 threads because we are using an unbounded 
> LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
> the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251478#comment-14251478
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251490#comment-14251490
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251486#comment-14251486
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java


> InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
> -
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apach

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251477#comment-14251477
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251481#comment-14251481
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #779 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/779/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2949:

Attachment: apache-yarn-2949.1.patch

Uploaded a new patch to address Junping's comments.

bq. 1. There are many related configuration properties for enabling/configuring 
CGroups, these properties are coming from yarn-site.xml (or yarn-default.xml) 
or container-executor.cfg. Please explain explicitly on where user should put 
these properties.

Fixed. All the properties go in yarn-site.xml. I added a line specifying that. 
I also removed reference to yarn-default.xml since they were confusing.

bq. 2. I like the discussion for session of CGroups and security, and it would 
be great to provide an example for configuration w/o security. (nice to have).

I added a table explaining the behaviour further. Let me know if that's ok.

{quote}
3. Several typos:

typo of CGgroups", should be "CGroups"
typo of "In our cause", should be "in our case"
{quote}

Thanks for catching them! Fixed. I'll upload the html generated once I get an 
ok from Jenkins.

> Add documentation for CGroups
> -
>
> Key: YARN-2949
> URL: https://issues.apache.org/jira/browse/YARN-2949
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
> apache-yarn-2949.1.patch
>
>
> A bunch of changes have gone into the NodeManager to allow greater use of 
> CGroups. It would be good to have a single page that documents how to setup 
> CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251660#comment-14251660
 ] 

Hadoop QA commented on YARN-2949:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12688030/apache-yarn-2949.1.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6145//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6145//console

This message is automatically generated.

> Add documentation for CGroups
> -
>
> Key: YARN-2949
> URL: https://issues.apache.org/jira/browse/YARN-2949
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
> apache-yarn-2949.1.patch
>
>
> A bunch of changes have gone into the NodeManager to allow greater use of 
> CGroups. It would be good to have a single page that documents how to setup 
> CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2949:

Attachment: (was: NodeManagerCgroups.html)

> Add documentation for CGroups
> -
>
> Key: YARN-2949
> URL: https://issues.apache.org/jira/browse/YARN-2949
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-2949.0.patch, apache-yarn-2949.1.patch
>
>
> A bunch of changes have gone into the NodeManager to allow greater use of 
> CGroups. It would be good to have a single page that documents how to setup 
> CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2949:

Attachment: NodeManagerCgroups.html

Uploaded the html generated by the latest patch.

> Add documentation for CGroups
> -
>
> Key: YARN-2949
> URL: https://issues.apache.org/jira/browse/YARN-2949
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
> apache-yarn-2949.1.patch
>
>
> A bunch of changes have gone into the NodeManager to allow greater use of 
> CGroups. It would be good to have a single page that documents how to setup 
> CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251670#comment-14251670
 ] 

Junping Du commented on YARN-2949:
--

+1. v2 patch LGTM. Will commit it shortly.

> Add documentation for CGroups
> -
>
> Key: YARN-2949
> URL: https://issues.apache.org/jira/browse/YARN-2949
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
> apache-yarn-2949.1.patch
>
>
> A bunch of changes have gone into the NodeManager to allow greater use of 
> CGroups. It would be good to have a single page that documents how to setup 
> CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251682#comment-14251682
 ] 

Hudson commented on YARN-2949:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6746 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6746/])
YARN-2949. Add documentation for CGroups. (Contributed by Varun Vasudev) 
(junping_du: rev 389f881d423c1f7c2bb90ff521e59eb8c7d26214)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerCgroups.apt.vm
* hadoop-project/src/site/site.xml
* hadoop-yarn-project/CHANGES.txt


> Add documentation for CGroups
> -
>
> Key: YARN-2949
> URL: https://issues.apache.org/jira/browse/YARN-2949
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.7.0
>
> Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
> apache-yarn-2949.1.patch
>
>
> A bunch of changes have gone into the NodeManager to allow greater use of 
> CGroups. It would be good to have a single page that documents how to setup 
> CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251697#comment-14251697
 ] 

Hadoop QA commented on YARN-2949:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12688030/apache-yarn-2949.1.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6146//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6146//console

This message is automatically generated.

> Add documentation for CGroups
> -
>
> Key: YARN-2949
> URL: https://issues.apache.org/jira/browse/YARN-2949
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.7.0
>
> Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch, 
> apache-yarn-2949.1.patch
>
>
> A bunch of changes have gone into the NodeManager to allow greater use of 
> CGroups. It would be good to have a single page that documents how to setup 
> CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251704#comment-14251704
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251705#comment-14251705
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251708#comment-14251708
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251717#comment-14251717
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251723#comment-14251723
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


> DelegationTokenRenewer thread pool never expands
> 
>
> Key: YARN-2972
> URL: https://issues.apache.org/jira/browse/YARN-2972
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.0
>
> Attachments: YARN-2972.001.patch
>
>
> DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
> number of threads is configurable, but unfortunately the pool never expands 
> beyond the hardcoded initial 5 threads because we are using an unbounded 
> LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
> the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251719#comment-14251719
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251709#comment-14251709
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt


> DelegationTokenRenewer thread pool never expands
> 
>
> Key: YARN-2972
> URL: https://issues.apache.org/jira/browse/YARN-2972
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.0
>
> Attachments: YARN-2972.001.patch
>
>
> DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
> number of threads is configurable, but unfortunately the pool never expands 
> beyond the hardcoded initial 5 threads because we are using an unbounded 
> LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
> the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251722#comment-14251722
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251727#comment-14251727
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java


> InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
> -
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apa

[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251713#comment-14251713
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #42 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/42/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* hadoop-yarn-project/CHANGES.txt


> InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
> -
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
>

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251718#comment-14251718
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/bin/yarn
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh


> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251731#comment-14251731
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1977 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1977/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java


> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251763#comment-14251763
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251771#comment-14251771
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java


> InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
> -
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodExc

[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251767#comment-14251767
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


> DelegationTokenRenewer thread pool never expands
> 
>
> Key: YARN-2972
> URL: https://issues.apache.org/jira/browse/YARN-2972
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.0
>
> Attachments: YARN-2972.001.patch
>
>
> DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
> number of threads is configurable, but unfortunately the pool never expands 
> beyond the hardcoded initial 5 threads because we are using an unbounded 
> LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
> the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251766#comment-14251766
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251762#comment-14251762
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251775#comment-14251775
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #46 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/46/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java


> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251801#comment-14251801
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251813#comment-14251813
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java


> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251805#comment-14251805
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt


> DelegationTokenRenewer thread pool never expands
> 
>
> Key: YARN-2972
> URL: https://issues.apache.org/jira/browse/YARN-2972
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.0
>
> Attachments: YARN-2972.001.patch
>
>
> DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
> number of threads is configurable, but unfortunately the pool never expands 
> beyond the hardcoded initial 5 threads because we are using an unbounded 
> LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
> the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251809#comment-14251809
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java


> InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
> -
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251800#comment-14251800
 ] 

Hudson commented on YARN-2189:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251804#comment-14251804
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1996 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1996/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251818#comment-14251818
 ] 

Jason Lowe commented on YARN-2964:
--

Thanks for the patch, Jian!  Findbug warnings appear to be unrelated.

I'm wondering about the change in the removeApplicationFromRenewal method or 
remove.  If a sub-job completes, won't we remove the token from the allTokens 
map before the launcher job has completed?  Then a subsequent sub-job that 
requests token cancelation can put the token back in the map and cause the 
token to be canceled when it leaves.  I think we need to repeat the logic from 
the original code before YARN-2704 here, i.e.: only remove the token if the 
application ID matches.  That way the launcher job's token will remain _the_ 
token in that collection until the launcher job completes.

This comment doesn't match the code, since the code looks like if any token 
wants to cancel at the end then we will cancel at the end.
{code}
  // If any of the jobs sharing the same token set shouldCancelAtEnd
  // to true, we should not cancel the token.
  if (evt.shouldCancelAtEnd) {
dttr.shouldCancelAtEnd = evt.shouldCancelAtEnd;
  }
{code}
I think the logic and comment should be if any job doesn't want to cancel then 
we won't cancel.  The code seems to be trying to do the opposite, so I'm not 
sure how the unit test is passing.  Maybe I'm missing something.

The info log message added in handleAppSubmitEvent also is misleading, as it 
says we are setting shouldCancelAtEnd to whatever the event said, when in 
reality we only set it sometimes.  Probably needs to be inside the conditional.

Wonder if we should be using a Set instead of a Map to track these tokens.  
Adding an already existing DelegationTokenToRenew in a set will not change the 
one already there, but with the map a sub-job can clobber the 
DelegationTokenToRenew that's already there with its own when it does the 
allTokens.put(dtr.token, dtr).

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251854#comment-14251854
 ] 

Allen Wittenauer commented on YARN-2203:


Should this have been .http.address instead of webapp.address to be consistent 
with the rest of Hadoop?

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2014-12-18 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2936:
---
Attachment: YARN-2936.001.patch

> YARNDelegationTokenIdentifier doesn't set proto.builder now
> ---
>
> Key: YARN-2936
> URL: https://issues.apache.org/jira/browse/YARN-2936
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Varun Saxena
> Attachments: YARN-2936.001.patch
>
>
> After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
> such that when constructing a object which extends 
> YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
> when we call getProto() of it, we will just get an empty proto object.
> It seems to do no harm to the production code path, as we will always call 
> getBytes() before using proto to persist the DT in the state store, when we 
> generating the password.
> I think the setter is removed to avoid duplicating setting the fields why 
> getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
> properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
> logic in secretManager. It's vulnerable if something is changed at 
> secretManager. For example, in the test case of YARN-2837, I spent time to 
> figure out we need to execute getBytes() first to make sure the testing DTs 
> can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2868) Add metric for initial container launch time

2014-12-18 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Attachment: YARN-2868.005.patch

- Cleaned up unused imports
- Didn't move @metric to a single line--that violates the 80 column width


> Add metric for initial container launch time
> 
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252003#comment-14252003
 ] 

Chris Trezzo commented on YARN-2203:


[~aw] Thanks for the comment! I didn't realize that was a convention 
considering there are other similar parameters that do not seem to follow that 
form. For some examples: yarn.resourcemanager.webapp.address, 
yarn.resourcemanager.admin.address, yarn.nodemanager.webapp.address. I can make 
note of your comment in YARN-2654 when we do a final pass on the config 
parameters and ensure that they have quality names.

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2654) Revisit all shared cache config parameters to ensure quality names

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252007#comment-14252007
 ] 

Karthik Kambatla commented on YARN-2654:


>From [~aw] on YARN-2203: Should this have been .http.address instead of 
>webapp.address to be consistent with the rest of Hadoop?

> Revisit all shared cache config parameters to ensure quality names
> --
>
> Key: YARN-2654
> URL: https://issues.apache.org/jira/browse/YARN-2654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Blocker
>
> Revisit all the shared cache config parameters in YarnConfiguration and 
> yarn-default.xml to ensure quality names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252008#comment-14252008
 ] 

Karthik Kambatla commented on YARN-2203:


Added a comment there. Thanks Allen. 

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252031#comment-14252031
 ] 

Allen Wittenauer commented on YARN-2203:


I wouldn't trust the YARN project to be any sort of guide to do anything with 
consistency.  It has proven over and over again that it wants everything to be 
hard to administer.

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252045#comment-14252045
 ] 

Jian He commented on YARN-2964:
---

thanks for your comments, Jason !

bq. I'm wondering about the change in the removeApplicationFromRenewal method 
or remove.
If launcher job first gets added to the appTokens map, DelegationTokenRenewer 
will not add DelegationTokenToRenew instance for the sub-job. So the tokens in 
removeApplicationFromRenewal will return empty for the sub-job when the sub-job 
completes. So the token won’t be removed from the allTokens. My only concern 
with a global set that is that each time an application completes, we end up 
looping all the applications or worse (each app may have at least one token).
bq. This comment doesn't match the code
good catch.. what a mistake.. I might be in the impression the semantics is 
“shouldKeepAtEnd”, I added one line in the test case to guard against this.
bq. Wonder if we should be using a Set instead of a Map to track these tokens
Thought about that too, the reason that switched to a map is to get the 
DelegationTokenToRenew instance based on the token app provided and change the 
shouldCancelAtEnd field on submission.

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2964:
--
Attachment: YARN-2964.2.patch

updated the patch based on some comments from Jason

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252078#comment-14252078
 ] 

Karthik Kambatla commented on YARN-2203:


bq. I wouldn't trust the YARN project to be any sort of guide to do anything 
with consistency. It has proven over and over again that it wants everything to 
be hard to administer.

If this comment was to provide direction, I am clearly missing it. Is the 
suggestion to drop whatever consistency the *sub*-project is *trying* to 
achieve and run in different directions? 

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252103#comment-14252103
 ] 

Allen Wittenauer commented on YARN-2203:


Consistency within the larger project is more important than consistency within 
the sub-project.   Operationally, it sucks having to switch gears depending 
upon which sub-project one is working on. "Oh, YARN calls it feature X, even 
though the rest of Hadoop has called it Y since pre-YARN."  

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252116#comment-14252116
 ] 

Karthik Kambatla commented on YARN-2203:


I see the value in the entire project being consistent, but not at the expense 
of inconsistency within the sub-project. I would like to stay out of "how we 
should have done things".

If it is true that there are inconsistencies with no good reason, it would be 
nice to address those inconsistencies in Hadoop-3, preferably in a 
backwards-compatible way. Any volunteers? 

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252139#comment-14252139
 ] 

Hadoop QA commented on YARN-2936:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688081/YARN-2936.001.patch
  against trunk revision 389f881.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 39 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6147//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6147//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6147//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6147//console

This message is automatically generated.

> YARNDelegationTokenIdentifier doesn't set proto.builder now
> ---
>
> Key: YARN-2936
> URL: https://issues.apache.org/jira/browse/YARN-2936
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Varun Saxena
> Attachments: YARN-2936.001.patch
>
>
> After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
> such that when constructing a object which extends 
> YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
> when we call getProto() of it, we will just get an empty proto object.
> It seems to do no harm to the production code path, as we will always call 
> getBytes() before using proto to persist the DT in the state store, when we 
> generating the password.
> I think the setter is removed to avoid duplicating setting the fields why 
> getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
> properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
> logic in secretManager. It's vulnerable if something is changed at 
> secretManager. For example, in the test case of YARN-2837, I spent time to 
> figure out we need to execute getBytes() first to make sure the testing DTs 
> can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) Add metric for initial container launch time

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252164#comment-14252164
 ] 

Hadoop QA commented on YARN-2868:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688084/YARN-2868.005.patch
  against trunk revision 389f881.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6148//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6148//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6148//console

This message is automatically generated.

> Add metric for initial container launch time
> 
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252195#comment-14252195
 ] 

Karthik Kambatla commented on YARN-2975:


bq. The removeApp from one operation is now 2 steps.
removeApp did not expose the two operations outside, but had them. 

IIUC, an app can only go from non-runnable to runnable, but not vice-versa. For 
instance, see the following snippet in FairScheduler#executeMove. So, don't 
think we need to worry about consistency. 
{code}
if (wasRunnable && !nowRunnable) {
  throw new IllegalStateException("Should have already verified that app "
  + attempt.getApplicationId() + " would be runnable in new queue");
}
{code}

{quote}
Nit: 
suggest resetPreemptedResources -> resetPreemptedResourcesRunnableApps
clearPreemptedResources -> clear PreemptedResourcesRunnableApps
{quote}
I thought of this, but decided against it. The additional RunnableApps doesn't 
add anything but extra characters. 

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252205#comment-14252205
 ] 

Karthik Kambatla commented on YARN-2975:


[~adhoot] - I see your point about consistency. If the app was not runnable at 
the time of check but becomes runnable by the time we remove it, we can miss 
untracking it. May be, we should clean up how we store runnability of an app in 
FSAppAttempt, and add transitions like rest of RM code to handle addition, 
move, removal of the apps. The scope might be too big to do in this JIRA. Okay 
with doing it in a follow-up? 

By the way, thanks a bunch for taking the time to review. 

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2936) YARNDelegationTokenIdentifier doesn't set proto.builder now

2014-12-18 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252213#comment-14252213
 ] 

Varun Saxena commented on YARN-2936:


[~zjshen], [~jianhe], kindly review.

Findbugs to be handled by YARN-2937 to YARN-2940.
Test failure unrelated and all these tests are passing in my local setup.

> YARNDelegationTokenIdentifier doesn't set proto.builder now
> ---
>
> Key: YARN-2936
> URL: https://issues.apache.org/jira/browse/YARN-2936
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Varun Saxena
> Attachments: YARN-2936.001.patch
>
>
> After YARN-2743, the setters are removed from YARNDelegationTokenIdentifier, 
> such that when constructing a object which extends 
> YARNDelegationTokenIdentifier, proto.builder is not set at all. Later on, 
> when we call getProto() of it, we will just get an empty proto object.
> It seems to do no harm to the production code path, as we will always call 
> getBytes() before using proto to persist the DT in the state store, when we 
> generating the password.
> I think the setter is removed to avoid duplicating setting the fields why 
> getBytes() is called. However, YARNDelegationTokenIdentifier doesn't work 
> properly alone. YARNDelegationTokenIdentifier is tightly coupled with the 
> logic in secretManager. It's vulnerable if something is changed at 
> secretManager. For example, in the test case of YARN-2837, I spent time to 
> figure out we need to execute getBytes() first to make sure the testing DTs 
> can be properly put into the state store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252216#comment-14252216
 ] 

Hadoop QA commented on YARN-2964:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688092/YARN-2964.2.patch
  against trunk revision 07619aa.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6149//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6149//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6149//console

This message is automatically generated.

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252218#comment-14252218
 ] 

Jason Lowe commented on YARN-2964:
--

bq. If launcher job first gets added to the appTokens map, 
DelegationTokenRenewer will not add DelegationTokenToRenew instance for the 
sub-job.

Ah, sorry, I missed this critical change from the original patch.  However if 
we don't add the delegation token for each sub-job then I think we have a 
problem with the following use-case:

# Oozie launcher submits a MapReduce sub-job
# MapReduce job starts
# Oozie launcher job leaves
# MapReduce job now running with a token that the RM has "forgotten" and won't 
be automatically renewed

We might have had the same issue in this case prior to YARN-2704, since the 
token would be pulled from the set when the launcher completed.

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252243#comment-14252243
 ] 

Jian He commented on YARN-2964:
---

bq. We might have had the same issue in this case prior to YARN-2704.
Yes, this is an existing issue. As Robert pointed out in the previous comment, 
oozie MapReduce sub-job now cannot run beyond 24 hrs. IMO, we can fix this 
separately ?

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252259#comment-14252259
 ] 

Jason Lowe commented on YARN-2964:
--

Sure, we can fix that as a followup issue since it's no worse than what we had 
before.

+1 lgtm, only nit is the new getAllTokens method should be package-private 
instead of public but not a big deal either way.  I assume the test failures 
are unrelated?

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252286#comment-14252286
 ] 

Jian He commented on YARN-2964:
---

I believe the failures are not related.  I just changed the visibility and 
uploaded a new patch to re-kick jenkins.

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2964:
--
Attachment: YARN-2964.3.patch

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252332#comment-14252332
 ] 

Jian He commented on YARN-2939:
---

+1

> Fix new findbugs warnings in hadoop-yarn-common
> ---
>
> Key: YARN-2939
> URL: https://issues.apache.org/jira/browse/YARN-2939
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Varun Saxena
>Assignee: Li Lu
>  Labels: findbugs
> Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-18 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252343#comment-14252343
 ] 

Anubhav Dhoot commented on YARN-2975:
-

Yes I am worried about getting it wrong for maxRunningEnforcer. Before the 
change, we would inside a lock achieve the removal of the app whether it was in 
runnable or not and be reasonably sure.
Now the splitting it into 2 non atomic steps outside as i listed above,  and 
also 2 steps inside {noformat} return removeRunnableApp(app) || 
removeNonRunnableApp(app) {noformat}, we might make it worse as each one leaves 
the lock before the other acquires. The application could be completely missed 
when it moves from nonrunnable to runnable in between.
How about making removeApp do try to remove from both runnable or nonRunnable 
inside a single writelock. We can try removing the redundancies with 
removeRunnableApp and removeNonRunnableApp by having a fourth internal method 
that all 3 delegate via flags to limit where to look for the app.  

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2868) Add metric for initial container launch time

2014-12-18 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252367#comment-14252367
 ] 

Ray Chiang commented on YARN-2868:
--

RE: findbugs.  None affecting this code.

RE: unit test failures.  None of these 5 tests failed in my tree.


> Add metric for initial container launch time
> 
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch, YARN-2868.005.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-18 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252467#comment-14252467
 ] 

Karthik Kambatla commented on YARN-2975:


I understand the concern. Pre-YARN-2910, we were in the exact same place and 
haven't seen any issues in practice. Let me take a look again and see if we can 
do what you are suggesting - do all of removeApp while holding the writeLock. 
Initially, I had a single method, but had to split based on prior accesses. 





> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2978) Null pointer in YarnProtos

2014-12-18 Thread Jason Tufo (JIRA)
Jason Tufo created YARN-2978:


 Summary: Null pointer in YarnProtos
 Key: YARN-2978
 URL: https://issues.apache.org/jira/browse/YARN-2978
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1
Reporter: Jason Tufo


 java.lang.NullPointerException
at 
org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
at 
org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252473#comment-14252473
 ] 

Hadoop QA commented on YARN-2964:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688133/YARN-2964.3.patch
  against trunk revision b9d4976.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRM

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6150//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6150//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6150//console

This message is automatically generated.

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2675) the containersKilled metrics is not updated when the container is killed during localization.

2014-12-18 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252481#comment-14252481
 ] 

Ray Chiang commented on YARN-2675:
--

Started reviewing, but it looks like this patch needs to be updated due to 
YARN-1156.

> the containersKilled metrics is not updated when the container is killed 
> during localization.
> -
>
> Key: YARN-2675
> URL: https://issues.apache.org/jira/browse/YARN-2675
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>  Labels: metrics, supportability
> Attachments: YARN-2675.000.patch, YARN-2675.001.patch, 
> YARN-2675.002.patch, YARN-2675.003.patch, YARN-2675.004.patch, 
> YARN-2675.005.patch
>
>
> The containersKilled metrics is not updated when the container is killed 
> during localization. We should add KILLING state in finished of 
> ContainerImpl.java to update killedContainer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2979) Unsupported operation exception in message building (YarnProtos)

2014-12-18 Thread Jason Tufo (JIRA)
Jason Tufo created YARN-2979:


 Summary: Unsupported operation exception in message building 
(YarnProtos)
 Key: YARN-2979
 URL: https://issues.apache.org/jira/browse/YARN-2979
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Affects Versions: 2.5.1
Reporter: Jason Tufo


java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.addAllApplications(YarnProtos.java:30702)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.addApplicationsToProto(QueueInfoPBImpl.java:227)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToBuilder(QueueInfoPBImpl.java:282)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:289)
at 
org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252507#comment-14252507
 ] 

Allen Wittenauer commented on YARN-2203:


I'm willing to do it, but it won't be backward compatible.

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252509#comment-14252509
 ] 

Hadoop QA commented on YARN-2939:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687590/YARN-2939-121614.patch
  against trunk revision c4d9713.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6151//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6151//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6151//console

This message is automatically generated.

> Fix new findbugs warnings in hadoop-yarn-common
> ---
>
> Key: YARN-2939
> URL: https://issues.apache.org/jira/browse/YARN-2939
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Varun Saxena
>Assignee: Li Lu
>  Labels: findbugs
> Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252510#comment-14252510
 ] 

Jason Lowe commented on YARN-2964:
--

+1 lgtm.  I don't believe the test failures are related since they pass for me 
locally.  Committing this.


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-12-18 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat updated YARN-2230:
-
Component/s: documentation

> Fix description of yarn.scheduler.maximum-allocation-vcores in 
> yarn-default.xml (or code)
> -
>
> Key: YARN-2230
> URL: https://issues.apache.org/jira/browse/YARN-2230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, documentation, scheduler
>Affects Versions: 2.4.0
>Reporter: Adam Kawa
>Priority: Minor
> Attachments: YARN-2230.001.patch
>
>
> When a user requests more vcores than the allocation limit (e.g. 
> mapreduce.map.cpu.vcores  is larger than 
> yarn.scheduler.maximum-allocation-vcores), then 
> InvalidResourceRequestException is thrown - 
> https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
> {code}
> if (resReq.getCapability().getVirtualCores() < 0 ||
> resReq.getCapability().getVirtualCores() >
> maximumResource.getVirtualCores()) {
>   throw new InvalidResourceRequestException("Invalid resource request"
>   + ", requested virtual cores < 0"
>   + ", or requested virtual cores > max configured"
>   + ", requestedVirtualCores="
>   + resReq.getCapability().getVirtualCores()
>   + ", maxVirtualCores=" + maximumResource.getVirtualCores());
> }
> {code}
> According to documentation - yarn-default.xml 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
>  the request should be capped to the allocation limit.
> {code}
>   
> The maximum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests higher than this won't take 
> effect,
> and will get capped to this value.
> yarn.scheduler.maximum-allocation-vcores
> 32
>   
> {code}
> This means that:
> * Either documentation or code should be corrected (unless this exception is 
> handled elsewhere accordingly, but it looks that it is not).
> This behavior is confusing, because when such a job (with 
> mapreduce.map.cpu.vcores is larger than 
> yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
> progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
> {code}
> 2014-06-29 00:34:51,469 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1403993411503_0002_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested virtual cores < 0, or requested virtual cores > 
> max configured, requestedVirtualCores=32, maxVirtualCores=3
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
> .
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> {code}
> * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
> obvious to discover why a job does not make any progress.
> The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-12-18 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat updated YARN-2230:
-
Attachment: YARN-2230.001.patch

> Fix description of yarn.scheduler.maximum-allocation-vcores in 
> yarn-default.xml (or code)
> -
>
> Key: YARN-2230
> URL: https://issues.apache.org/jira/browse/YARN-2230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, documentation, scheduler
>Affects Versions: 2.4.0
>Reporter: Adam Kawa
>Priority: Minor
> Attachments: YARN-2230.001.patch
>
>
> When a user requests more vcores than the allocation limit (e.g. 
> mapreduce.map.cpu.vcores  is larger than 
> yarn.scheduler.maximum-allocation-vcores), then 
> InvalidResourceRequestException is thrown - 
> https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
> {code}
> if (resReq.getCapability().getVirtualCores() < 0 ||
> resReq.getCapability().getVirtualCores() >
> maximumResource.getVirtualCores()) {
>   throw new InvalidResourceRequestException("Invalid resource request"
>   + ", requested virtual cores < 0"
>   + ", or requested virtual cores > max configured"
>   + ", requestedVirtualCores="
>   + resReq.getCapability().getVirtualCores()
>   + ", maxVirtualCores=" + maximumResource.getVirtualCores());
> }
> {code}
> According to documentation - yarn-default.xml 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
>  the request should be capped to the allocation limit.
> {code}
>   
> The maximum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests higher than this won't take 
> effect,
> and will get capped to this value.
> yarn.scheduler.maximum-allocation-vcores
> 32
>   
> {code}
> This means that:
> * Either documentation or code should be corrected (unless this exception is 
> handled elsewhere accordingly, but it looks that it is not).
> This behavior is confusing, because when such a job (with 
> mapreduce.map.cpu.vcores is larger than 
> yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
> progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
> {code}
> 2014-06-29 00:34:51,469 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1403993411503_0002_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested virtual cores < 0, or requested virtual cores > 
> max configured, requestedVirtualCores=32, maxVirtualCores=3
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
> .
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> {code}
> * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
> obvious to discover why a job does not make any progress.
> The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-12-18 Thread Vijay Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat updated YARN-2230:
-
Attachment: (was: YARN-2230.001.patch)

> Fix description of yarn.scheduler.maximum-allocation-vcores in 
> yarn-default.xml (or code)
> -
>
> Key: YARN-2230
> URL: https://issues.apache.org/jira/browse/YARN-2230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, documentation, scheduler
>Affects Versions: 2.4.0
>Reporter: Adam Kawa
>Priority: Minor
> Attachments: YARN-2230.001.patch
>
>
> When a user requests more vcores than the allocation limit (e.g. 
> mapreduce.map.cpu.vcores  is larger than 
> yarn.scheduler.maximum-allocation-vcores), then 
> InvalidResourceRequestException is thrown - 
> https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
> {code}
> if (resReq.getCapability().getVirtualCores() < 0 ||
> resReq.getCapability().getVirtualCores() >
> maximumResource.getVirtualCores()) {
>   throw new InvalidResourceRequestException("Invalid resource request"
>   + ", requested virtual cores < 0"
>   + ", or requested virtual cores > max configured"
>   + ", requestedVirtualCores="
>   + resReq.getCapability().getVirtualCores()
>   + ", maxVirtualCores=" + maximumResource.getVirtualCores());
> }
> {code}
> According to documentation - yarn-default.xml 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
>  the request should be capped to the allocation limit.
> {code}
>   
> The maximum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests higher than this won't take 
> effect,
> and will get capped to this value.
> yarn.scheduler.maximum-allocation-vcores
> 32
>   
> {code}
> This means that:
> * Either documentation or code should be corrected (unless this exception is 
> handled elsewhere accordingly, but it looks that it is not).
> This behavior is confusing, because when such a job (with 
> mapreduce.map.cpu.vcores is larger than 
> yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
> progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
> {code}
> 2014-06-29 00:34:51,469 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1403993411503_0002_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested virtual cores < 0, or requested virtual cores > 
> max configured, requestedVirtualCores=32, maxVirtualCores=3
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
> .
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> {code}
> * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
> obvious to discover why a job does not make any progress.
> The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252563#comment-14252563
 ] 

Hudson commented on YARN-2964:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6755 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6755/])
YARN-2964. RM prematurely cancels tokens for jobs that submit jobs (oozie). 
Contributed by Jian He (jlowe: rev 0402bada1989258ecbfdc437cb339322a1f55a97)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252592#comment-14252592
 ] 

Hadoop QA commented on YARN-2230:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688178/YARN-2230.001.patch
  against trunk revision 0402bad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 25 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6152//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6152//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6152//console

This message is automatically generated.

> Fix description of yarn.scheduler.maximum-allocation-vcores in 
> yarn-default.xml (or code)
> -
>
> Key: YARN-2230
> URL: https://issues.apache.org/jira/browse/YARN-2230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, documentation, scheduler
>Affects Versions: 2.4.0
>Reporter: Adam Kawa
>Priority: Minor
> Attachments: YARN-2230.001.patch
>
>
> When a user requests more vcores than the allocation limit (e.g. 
> mapreduce.map.cpu.vcores  is larger than 
> yarn.scheduler.maximum-allocation-vcores), then 
> InvalidResourceRequestException is thrown - 
> https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
> {code}
> if (resReq.getCapability().getVirtualCores() < 0 ||
> resReq.getCapability().getVirtualCores() >
> maximumResource.getVirtualCores()) {
>   throw new InvalidResourceRequestException("Invalid resource request"
>   + ", requested virtual cores < 0"
>   + ", or requested virtual cores > max configured"
>   + ", requestedVirtualCores="
>   + resReq.getCapability().getVirtualCores()
>   + ", maxVirtualCores=" + maximumResource.getVirtualCores());
> }
> {code}
> According to documentation - yarn-default.xml 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
>  the request should be capped to the allocation limit.
> {code}
>   
> The maximum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests higher than this won't take 
> effect,
> and will get capped to this value.
> yarn.scheduler.maximum-allocation-vcores
> 32
>   
> {code}
> This means that:
> * Either documentation or code should be corrected (unless this exception is 
> handled elsewhere accordingly, but it looks that it is not).
> This behavior is confusing, because when such a job (with 
> mapreduce.map.cpu.vcores is larger than 
> yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
> progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
> {code}
> 2014-06-29 00:34:51,469 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1403993411503_0002_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested virtual cores < 0, or requested virtual cores > 
> max configured, requestedVirtualCores=32, maxVirtualCores=3
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
> .
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)

[jira] [Updated] (YARN-2217) Shared cache client side changes

2014-12-18 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2217:
---
Attachment: YARN-2217-trunk-v4.patch

[~kasha] V4 attached. Patch verified manually.

> Shared cache client side changes
> 
>
> Key: YARN-2217
> URL: https://issues.apache.org/jira/browse/YARN-2217
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
> YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch
>
>
> Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252611#comment-14252611
 ] 

Jian He commented on YARN-2964:
---

thanks for reviewing and committing, Jason !

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: YARN-2964.1.patch, YARN-2964.2.patch, YARN-2964.3.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

2014-12-18 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.19.patch

Modified patch to use minimum allocation value if application master resource 
is unavailable

> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> 
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Assignee: Craig Welch
>Priority: Critical
> Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
> YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
> YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
> YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
> YARN-2637.9.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
> for (Iterator i=pendingApplications.iterator(); 
>  i.hasNext(); ) {
>   FiCaSchedulerApp application = i.next();
>   
>   // Check queue limit
>   if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
>   }
>   
>   // Check user limit
>   User user = getUser(application.getUser());
>   if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
> LOG.info("Application " + application.getApplicationId() +
> " from user: " + application.getUser() + 
> " activated in queue: " + getQueueName());
>   }
> }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2654) Revisit all shared cache config parameters to ensure quality names

2014-12-18 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2654:
---
Attachment: shared_cache_config_parameters.txt

Attached is a text file with all of the shared cache config parameters, their 
descriptions and defaults (taken from yarn-default.xml).

> Revisit all shared cache config parameters to ensure quality names
> --
>
> Key: YARN-2654
> URL: https://issues.apache.org/jira/browse/YARN-2654
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: shared_cache_config_parameters.txt
>
>
> Revisit all the shared cache config parameters in YarnConfiguration and 
> yarn-default.xml to ensure quality names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

2014-12-18 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252620#comment-14252620
 ] 

Craig Welch commented on YARN-2637:
---

Minor update - TestAllocationFileLoaderService passes for me, I think it's a 
build server issue.  Also, I believe the findbugs is still related to the jdk7 
update - but they had disappeared before I had a chance to verify, they will 
re-run with the new patch and I will confirm that they are not related to my 
change.   In all, I believe the change is fine in terms of the release audit 
checks...

The only change in this version vs the last is with respect to:

bq. 2. FiCaSchedulerApp constructor

As I said before, this is present in non-test test scenarios.  However, I 
realized that I could use the minimum allocation from the scheduler if it is 
not present, which would mean at worse we would have the "old behavior" if 
there is not an actual amresource to work with - so I adjusted the code to do 
that if necessary & possible




> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> 
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Assignee: Craig Welch
>Priority: Critical
> Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
> YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
> YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
> YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
> YARN-2637.9.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
> for (Iterator i=pendingApplications.iterator(); 
>  i.hasNext(); ) {
>   FiCaSchedulerApp application = i.next();
>   
>   // Check queue limit
>   if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
>   }
>   
>   // Check user limit
>   User user = getUser(application.getUser());
>   if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
> LOG.info("Application " + application.getApplicationId() +
> " from user: " + application.getUser() + 
> " activated in queue: " + getQueueName());
>   }
> }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-18 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252629#comment-14252629
 ] 

Mayank Bansal commented on YARN-2933:
-

These find bug and test failure is not due to this patch.

Thanks,
Mayank

> Capacity Scheduler preemption policy should only consider capacity without 
> labels temporarily
> -
>
> Key: YARN-2933
> URL: https://issues.apache.org/jira/browse/YARN-2933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Mayank Bansal
> Attachments: YARN-2933-1.patch
>
>
> Currently, we have capacity enforcement on each queue for each label in 
> CapacityScheduler, but we don't have preemption policy to support that. 
> YARN-2498 is targeting to support preemption respect node labels, but we have 
> some gaps in code base, like queues/FiCaScheduler should be able to get 
> usedResource/pendingResource, etc. by label. These items potentially need to 
> refactor CS which we need spend some time carefully think about.
> For now, what immediately we can do is allow calculate ideal_allocation and 
> preempt containers only for resources on nodes without labels, to avoid 
> regression like: A cluster has some nodes with labels and some not, assume 
> queueA isn't satisfied for resource without label, but for now, preemption 
> policy may preempt resource from nodes with labels for queueA, that is not 
> correct.
> Again, it is just a short-term enhancement, YARN-2498 will consider 
> preemption respecting node-labels for Capacity Scheduler which is our final 
> target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252682#comment-14252682
 ] 

Hadoop QA commented on YARN-2217:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12688186/YARN-2217-trunk-v4.patch
  against trunk revision 0402bad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  org.apache.hadoop.yarn.client.api.impl.TestYarnClient

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6153//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6153//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6153//console

This message is automatically generated.

> Shared cache client side changes
> 
>
> Key: YARN-2217
> URL: https://issues.apache.org/jira/browse/YARN-2217
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
> YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch
>
>
> Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2217) Shared cache client side changes

2014-12-18 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252700#comment-14252700
 ] 

Chris Trezzo commented on YARN-2217:


Findbugs and test failures seem unrelated.

> Shared cache client side changes
> 
>
> Key: YARN-2217
> URL: https://issues.apache.org/jira/browse/YARN-2217
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch, 
> YARN-2217-trunk-v3.patch, YARN-2217-trunk-v4.patch
>
>
> Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

2014-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252753#comment-14252753
 ] 

Hadoop QA commented on YARN-2637:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12688189/YARN-2637.19.patch
  against trunk revision 0402bad.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6154//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6154//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6154//console

This message is automatically generated.

> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> 
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Assignee: Craig Welch
>Priority: Critical
> Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
> YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
> YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
> YARN-2637.19.patch, YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, 
> YARN-2637.9.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
> for (Iterator i=pendingApplications.iterator(); 
>  i.hasNext(); ) {
>   FiCaSchedulerApp application = i.next();
>   
>   // Check queue limit
>   if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
>   }
>   
>   // Check user limit
>   User user = getUser(application.getUser());
>   if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
> LOG.info("Application " + application.getApplicationId() +
> " from user: " + application.getUser() + 
> " activated in queue: " + getQueueName());
>   }
> }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2978) Null pointer in YarnProtos

2014-12-18 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2978:
--

Assignee: Varun Saxena

> Null pointer in YarnProtos
> --
>
> Key: YARN-2978
> URL: https://issues.apache.org/jira/browse/YARN-2978
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.1
>Reporter: Jason Tufo
>Assignee: Varun Saxena
>
>  java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto.isInitialized(YarnProtos.java:29625)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.build(YarnProtos.java:29939)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:290)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2979) Unsupported operation exception in message building (YarnProtos)

2014-12-18 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2979:
--

Assignee: Varun Saxena

> Unsupported operation exception in message building (YarnProtos)
> 
>
> Key: YARN-2979
> URL: https://issues.apache.org/jira/browse/YARN-2979
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Jason Tufo
>Assignee: Varun Saxena
>
> java.lang.UnsupportedOperationException
>   at java.util.AbstractList.add(AbstractList.java:148)
>   at java.util.AbstractList.add(AbstractList.java:108)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$QueueInfoProto$Builder.addAllApplications(YarnProtos.java:30702)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.addApplicationsToProto(QueueInfoPBImpl.java:227)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToBuilder(QueueInfoPBImpl.java:282)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.mergeLocalToProto(QueueInfoPBImpl.java:289)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.QueueInfoPBImpl.getProto(QueueInfoPBImpl.java:157)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.convertToProtoFormat(GetQueueInfoResponsePBImpl.java:128)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToBuilder(GetQueueInfoResponsePBImpl.java:104)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.mergeLocalToProto(GetQueueInfoResponsePBImpl.java:111)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.GetQueueInfoResponsePBImpl.getProto(GetQueueInfoResponsePBImpl.java:53)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:235)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2980) Move health check script related functionality to hadoop-common

2014-12-18 Thread Ming Ma (JIRA)
Ming Ma created YARN-2980:
-

 Summary: Move health check script related functionality to 
hadoop-common
 Key: YARN-2980
 URL: https://issues.apache.org/jira/browse/YARN-2980
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ming Ma


HDFS might want to leverage health check functionality available in YARN in 
both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
https://issues.apache.org/jira/browse/HDFS-7441.

We can move health check functionality including the protocol between hadoop 
daemons and health check script to hadoop-common. That will simplify the 
development and maintenance for both hadoop source code and health check script.

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >