[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-10-02 Thread Abin Shahab (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated YARN-1964:
--
Attachment: YARN-1964.patch

Patch with all classpath changes inside DCE.

> Create Docker analog of the LinuxContainerExecutor in YARN
> --
>
> Key: YARN-1964
> URL: https://issues.apache.org/jira/browse/YARN-1964
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.2.0
>Reporter: Arun C Murthy
>Assignee: Abin Shahab
> Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, 
> YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, 
> yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
> yarn-1964-docker.patch
>
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to *package* their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS

2014-10-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2635:
---
Attachment: yarn-2635-4.patch

Thanks for the review, Sandy. I tried to parametrize based on conf through a 
static block in the base class, but couldn't get it to work. The updated patch 
addresses remaining of your comments.

> TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
> --
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, 
> yarn-2635-4.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS

2014-10-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2635:
---
Summary: TestRM, TestRMRestart, TestClientToAMTokens should run with both 
CS and FS  (was: TestRMRestart should run with all schedulers)

> TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
> --
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2476) Apps are scheduled in random order after RM failover

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157671#comment-14157671
 ] 

Tsuyoshi OZAWA commented on YARN-2476:
--

Closing this issue as a duplicated issue of (is part of) of YARN-556. Please 
feel free to reopen this issue if you have any comments. Thanks!

> Apps are scheduled in random order after RM failover
> 
>
> Key: YARN-2476
> URL: https://issues.apache.org/jira/browse/YARN-2476
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
> Environment: Linux
>Reporter: Santosh Marella
>  Labels: ha, high-availability, resourcemanager
>
> RM HA is configured with 2 RMs. Used FileSystemRMStateStore.
> Fairscheduler allocation file is configured in yarn-site.xml:
> 
>   yarn.scheduler.fair.allocation.file
>   /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop/allocation-pools.xml
> 
> FS allocation-pools.xml:
> 
> 
>
>   1 mb,10vcores
>   19000 mb,100vcores
>   5525
>   4.5
>   fair
>   3600
>
>
>   1 mb,10vcores
>   19000 mb,100vcores
>   5525
>   1.5
>   fair
>   3600
>
> 600
> 600
> 
> Submitted 10 sleep jobs to a FS queue using the command:
> hadoop jar hadoop-mapreduce-examples-2.4.1-mapr-4.0.1-SNAPSHOT.jar sleep
> -Dmapreduce.job.queuename=root.dev  -m 10 -r 10 -mt 1 -rt 1
> All the jobs were submitted by the same user, with the same priority and 
> to the
> same queue. No other jobs were running in the cluster. Jobs started 
> executing
> in the order in which they were submitted (jobs 6 to 10 were active, 
> while 11
> to 15 were waiting):
> root@perfnode131:/opt/mapr/hadoop/hadoop-2.4.1/logs# yarn application 
> -list
> Total number of applications (application-types: [] and states: 
> [SUBMITTED,ACCEPTED, RUNNING]):10
> Application-Id  Application-NameApplication-Type User 
>   Queue   State Final-State Progress  
>   Tracking-URL
> application_1408572781346_0012 Sleep job   
> MAPREDUCE userAroot.devACCEPTED   
> UNDEFINED 0% N/A
> application_1408572781346_0014 Sleep job   
> MAPREDUCE userAroot.devACCEPTED   
> UNDEFINED 0% N/A
> application_1408572781346_0011 Sleep job   
> MAPREDUCE userAroot.devACCEPTED   
> UNDEFINED 0% N/A
> application_1408572781346_0010 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5% http://perfnode132:52799
> application_1408572781346_0008 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5% http://perfnode131:33766
> application_1408572781346_0009 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5% http://perfnode132:50964
> application_1408572781346_0007 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5% http://perfnode134:52966
> application_1408572781346_0015 Sleep job   
> MAPREDUCE userAroot.devACCEPTED   
> UNDEFINED 0% N/A
> application_1408572781346_0006 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 9.5% http://perfnode134:34094
> application_1408572781346_0013 Sleep job   
> MAPREDUCE userAroot.devACCEPTED   
> UNDEFINED 0%  N/A
> Stopped RM1. There was a failover and RM2 became active. But the jobs 
> seem to
> have started in a different order:
> root@perfnode131:~/scratch/raw_rm_logs_fs_hang# yarn application -list
> 14/08/21 07:26:13 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
> over to rm2
> Total number of applications (application-types: [] and states: 
> [SUBMITTED,ACCEPTED, RUNNING]):10
> Application-Id  Application-NameApplication-Type User 
>   Queue   State Final-State Progress  
>   Tracking-URL
> application_1408572781346_0012 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5%http://perfnode134:59351
> application_1408572781346_0014 Sleep job   
> MAPREDUCE userAroot.dev R

[jira] [Resolved] (YARN-2476) Apps are scheduled in random order after RM failover

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA resolved YARN-2476.
--
Resolution: Duplicate

> Apps are scheduled in random order after RM failover
> 
>
> Key: YARN-2476
> URL: https://issues.apache.org/jira/browse/YARN-2476
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
> Environment: Linux
>Reporter: Santosh Marella
>  Labels: ha, high-availability, resourcemanager
>
> RM HA is configured with 2 RMs. Used FileSystemRMStateStore.
> Fairscheduler allocation file is configured in yarn-site.xml:
> 
>   yarn.scheduler.fair.allocation.file
>   /opt/mapr/hadoop/hadoop-2.4.1/etc/hadoop/allocation-pools.xml
> 
> FS allocation-pools.xml:
> 
> 
>
>   1 mb,10vcores
>   19000 mb,100vcores
>   5525
>   4.5
>   fair
>   3600
>
>
>   1 mb,10vcores
>   19000 mb,100vcores
>   5525
>   1.5
>   fair
>   3600
>
> 600
> 600
> 
> Submitted 10 sleep jobs to a FS queue using the command:
> hadoop jar hadoop-mapreduce-examples-2.4.1-mapr-4.0.1-SNAPSHOT.jar sleep
> -Dmapreduce.job.queuename=root.dev  -m 10 -r 10 -mt 1 -rt 1
> All the jobs were submitted by the same user, with the same priority and 
> to the
> same queue. No other jobs were running in the cluster. Jobs started 
> executing
> in the order in which they were submitted (jobs 6 to 10 were active, 
> while 11
> to 15 were waiting):
> root@perfnode131:/opt/mapr/hadoop/hadoop-2.4.1/logs# yarn application 
> -list
> Total number of applications (application-types: [] and states: 
> [SUBMITTED,ACCEPTED, RUNNING]):10
> Application-Id  Application-NameApplication-Type User 
>   Queue   State Final-State Progress  
>   Tracking-URL
> application_1408572781346_0012 Sleep job   
> MAPREDUCE userAroot.devACCEPTED   
> UNDEFINED 0% N/A
> application_1408572781346_0014 Sleep job   
> MAPREDUCE userAroot.devACCEPTED   
> UNDEFINED 0% N/A
> application_1408572781346_0011 Sleep job   
> MAPREDUCE userAroot.devACCEPTED   
> UNDEFINED 0% N/A
> application_1408572781346_0010 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5% http://perfnode132:52799
> application_1408572781346_0008 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5% http://perfnode131:33766
> application_1408572781346_0009 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5% http://perfnode132:50964
> application_1408572781346_0007 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5% http://perfnode134:52966
> application_1408572781346_0015 Sleep job   
> MAPREDUCE userAroot.devACCEPTED   
> UNDEFINED 0% N/A
> application_1408572781346_0006 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 9.5% http://perfnode134:34094
> application_1408572781346_0013 Sleep job   
> MAPREDUCE userAroot.devACCEPTED   
> UNDEFINED 0%  N/A
> Stopped RM1. There was a failover and RM2 became active. But the jobs 
> seem to
> have started in a different order:
> root@perfnode131:~/scratch/raw_rm_logs_fs_hang# yarn application -list
> 14/08/21 07:26:13 INFO client.ConfiguredRMFailoverProxyProvider: Failing 
> over to rm2
> Total number of applications (application-types: [] and states: 
> [SUBMITTED,ACCEPTED, RUNNING]):10
> Application-Id  Application-NameApplication-Type User 
>   Queue   State Final-State Progress  
>   Tracking-URL
> application_1408572781346_0012 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5%http://perfnode134:59351
> application_1408572781346_0014 Sleep job   
> MAPREDUCE userAroot.dev RUNNING   
> UNDEFINED 5%http://perfnode132:37866
> application_1408572781346_0011 Sleep job   
> MAPREDUCE userAroot.dev 

[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157667#comment-14157667
 ] 

Hadoop QA commented on YARN-2615:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672724/YARN-2615-v3.patch
  against trunk revision 2d8e6e2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5249//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5249//console

This message is automatically generated.

> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2615:
-
Attachment: YARN-2615-v3.patch

Thanks [~ozawa] for tracking the Jenkins issue and [~jianhe] for review. In v3 
patch, remove unnecessary code mentioned by Jian, but leave some override 
methods in *ForTest as it need to access the subclass's proto. 

> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157624#comment-14157624
 ] 

Sandy Ryza commented on YARN-2635:
--

This seems like a good idea.  A few stylistic comments.

Can we rename RMSchedulerParametrizedTestBase to 
ParameterizedSchedulerTestBase?  The former confuses me a little because it 
like something that happened, rather than a noun, and "RM" doesn't seem 
necessary.  Also, Parameterized as spelled in the JUnit class name has three 
e's.  Lastly, can the class include some header comments on what it's doing?

{code}
+  protected void configScheduler(YarnConfiguration conf) throws IOException {
+// Configure scheduler
{code}
Just name the method configureScheduler instead of an abbreviation then comment.

{code}
+  private void configFifoScheduler(YarnConfiguration conf) {
+conf.set(YarnConfiguration.RM_SCHEDULER, FifoScheduler.class.getName());
+  }
+
+  private void configCapacityScheduler(YarnConfiguration conf) {
+conf.set(YarnConfiguration.RM_SCHEDULER, 
CapacityScheduler.class.getName());
+  }
{code}
These are only one line - can we just inline them?

{code}
+  protected YarnConfiguration conf = null;
{code}
I think better to make this private and expose it through a getConfig method.

Running the tests without FIFO seems reasonable to me.

One last thought - not sure how feasible this is, but the code might be simpler 
if we get rid of SchedulerType and just have the parameters be Configuration 
objects?

> TestRMRestart should run with all schedulers
> 
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157619#comment-14157619
 ] 

Hadoop QA commented on YARN-2468:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672717/YARN-2468.11.patch
  against trunk revision 054f285.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5248//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5248//console

This message is automatically generated.

> Log handling for LRS
> 
>
> Key: YARN-2468
> URL: https://issues.apache.org/jira/browse/YARN-2468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
> YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
> YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
> YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
> YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
> YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
> YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
> YARN-2468.9.1.patch, YARN-2468.9.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2468) Log handling for LRS

2014-10-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2468:

Attachment: YARN-2468.11.patch

> Log handling for LRS
> 
>
> Key: YARN-2468
> URL: https://issues.apache.org/jira/browse/YARN-2468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
> YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
> YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
> YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
> YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
> YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
> YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
> YARN-2468.9.1.patch, YARN-2468.9.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-02 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157588#comment-14157588
 ] 

Xuan Gong commented on YARN-2468:
-

bq. Also ContainerLogAggregator.uploadedFileMeta is also not needed to be a 
class member.

I think ContainerLogAggregator.uploadedFileMeta is needed to be a class member. 
It is used to keep track of all previous uploaded log files for each container. 
We will use this information to decide whether this log can be aggregated. 

New patch addressed other comments

> Log handling for LRS
> 
>
> Key: YARN-2468
> URL: https://issues.apache.org/jira/browse/YARN-2468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
> YARN-2468.11.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
> YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
> YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
> YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
> YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
> YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
> YARN-2468.9.1.patch, YARN-2468.9.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157579#comment-14157579
 ] 

Hadoop QA commented on YARN-2635:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672709/yarn-2635-3.patch
  against trunk revision 054f285.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5247//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5247//console

This message is automatically generated.

> TestRMRestart should run with all schedulers
> 
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2612) Some completed containers are not reported to NM

2014-10-02 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong resolved YARN-2612.

Resolution: Duplicate

> Some completed containers are not reported to NM
> 
>
> Key: YARN-2612
> URL: https://issues.apache.org/jira/browse/YARN-2612
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
> Fix For: 2.6.0
>
>
> We are testing RM work preserving restart and found the following logs when 
> we ran a simple MapReduce task "PI". Some completed containers which already 
> pulled by AM never reported back to NM, so NM continuously report the 
> completed containers while AM had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In YARN-1372, NM will report completed containers to RM until it gets ACK 
> from RM.  If AM does not call allocate, which means that AM does not ack RM, 
> RM will not ack NM. We([~chenchun]) have observed these two cases when 
> running Mapreduce task 'pi':
> 1) RM sends completed containers to AM. After receiving it, AM thinks it has 
> done the work and does not need resource, so it does not call allocate.
> 2) When AM finishes, it could not ack to RM because AM itself has not 
> finished yet.
> We think when RMAppAttempt call BaseFinalTransition, it means AppAttempt 
> finishes, then RM could send this AppAttempt's completed containers to NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157570#comment-14157570
 ] 

Jun Gong commented on YARN-2640:


[~ozawa], thank you for telling me. Close it now.

> TestDirectoryCollection.testCreateDirectories failed
> 
>
> Key: YARN-2640
> URL: https://issues.apache.org/jira/browse/YARN-2640
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-2640.2.patch, YARN-2640.patch
>
>
> When running test "mvn test -Dtest=TestDirectoryCollection", it failed:
> {code}
> Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
>   Time elapsed: 0.969 sec  <<< FAILURE!
> java.lang.AssertionError: local dir parent not created with proper 
> permissions expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
> {code}
> I found it was because testDiskSpaceUtilizationLimit ran before 
> testCreateDirectories when running test, then directory "dirA" was created in 
> test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
> to create "dirA" with specified permission, it found "dirA" has already been 
> there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2641) improve node decommission latency in RM.

2014-10-02 Thread zhihai xu (JIRA)
zhihai xu created YARN-2641:
---

 Summary: improve node decommission latency in RM.
 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu


improve node decommission latency in RM. 
Currently the node decommission only happened after RM received nodeHeartbeat 
from the Node Manager. The node heartbeat interval is configurable. The default 
value is 1 second.
It will be better to do the decommission during RM Refresh(NodesListManager) 
instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157524#comment-14157524
 ] 

Craig Welch commented on YARN-1198:
---

FYI, it's not possible to call the getAndCalculateHeadroom because nothing can 
synchronize on the queue during the allocation call without deadlocking - this 
is why it's necessary to break out the headroom they way it is here and store 
some items (such as the LeafQueue.User, which comes from the usermanager and 
syncs on the queu) to avoid any synchronization on the queue itself during the 
final headroom calculation in the allocate/getHeadroom step.  It's not a bad 
thing to do anyway, to reduce the number of operations (somewhat) in that final 
headroom calculation - but it is also why we can't just call the 
getAndCalculateHeadroom as such (unchanged) in allocate()

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.11-with-1857.patch, YARN-1198.11.patch, YARN-1198.2.patch, 
> YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, 
> YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157518#comment-14157518
 ] 

Tsuyoshi OZAWA commented on YARN-2562:
--

[~jianhe], could you check latest patch?

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
> YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5-4.patch, YARN-2562.5.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157515#comment-14157515
 ] 

Karthik Kambatla commented on YARN-2635:


By the way, these tests take a long time to run. Do we want to run against all 
three schedulers? Or, would it be enough to run against CS and FS?

> TestRMRestart should run with all schedulers
> 
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2635:
---
Attachment: yarn-2635-3.patch

I was reviewing Wei's patch. While trying out my would-be-suggestions, I ended 
up making more than I wanted.

Here is the patch that:
# moves the schedulerSetup Before method to parent class
# adds a method to keep track of RMs created in TestRMRestart, so they can 
stopped after the test is done. Without this, some of the tests were failing 
depending on order of execution. 


> TestRMRestart should run with all schedulers
> 
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157490#comment-14157490
 ] 

Hadoop QA commented on YARN-2562:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672691/YARN-2562.5-4.patch
  against trunk revision 054f285.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5246//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5246//console

This message is automatically generated.

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
> YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5-4.patch, YARN-2562.5.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

2014-10-02 Thread Santosh Marella (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157461#comment-14157461
 ] 

Santosh Marella commented on YARN-556:
--

Referencing YARN-2476 here to ensure the specific scenario mentioned there is 
fixed as part of this JIRA.

> RM Restart phase 2 - Work preserving restart
> 
>
> Key: YARN-556
> URL: https://issues.apache.org/jira/browse/YARN-556
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: Work Preserving RM Restart.pdf, 
> WorkPreservingRestartPrototype.001.patch, YARN-1372.prelim.patch
>
>
> YARN-128 covered storing the state needed for the RM to recover critical 
> information. This umbrella jira will track changes needed to recover the 
> running state of the cluster so that work can be preserved across RM restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: YARN-2562.5-4.patch

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
> YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5-4.patch, YARN-2562.5.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157431#comment-14157431
 ] 

Benoy Antony commented on YARN-2527:


Thanks a lot, [~zjshen].

> NPE in ApplicationACLsManager
> -
>
> Key: YARN-2527
> URL: https://issues.apache.org/jira/browse/YARN-2527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Fix For: 2.6.0
>
> Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch
>
>
> NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
> The relevant stacktrace snippet from the ResourceManager logs is as below
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {code}
> This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2598) GHS should show N/A instead of null for the inaccessible information

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157349#comment-14157349
 ] 

Hadoop QA commented on YARN-2598:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672667/YARN-2598.2.patch
  against trunk revision 054f285.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5245//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5245//console

This message is automatically generated.

> GHS should show N/A instead of null for the inaccessible information
> 
>
> Key: YARN-2598
> URL: https://issues.apache.org/jira/browse/YARN-2598
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2598.1.patch, YARN-2598.2.patch
>
>
> When the user doesn't have the access to an application, the app attempt 
> information is not visible to the user. ClientRMService will output N/A, but 
> GHS is showing null, which is not user-friendly.
> {code}
> 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: 
> http://nn.example.com:8188/ws/v1/timeline/
> 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at 
> nn.example.com/240.0.0.11:8050
> 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History 
> server at nn.example.com/240.0.0.11:10200
> Application Report : 
>   Application-Id : application_1411586934799_0001
>   Application-Name : Sleep job
>   Application-Type : MAPREDUCE
>   User : hrt_qa
>   Queue : default
>   Start-Time : 1411586956012
>   Finish-Time : 1411586989169
>   Progress : 100%
>   State : FINISHED
>   Final-State : SUCCEEDED
>   Tracking-URL : null
>   RPC Port : -1
>   AM Host : null
>   Aggregate Resource Allocation : N/A
>   Diagnostics : null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-02 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157336#comment-14157336
 ] 

Sandy Ryza commented on YARN-1414:
--

Awesome

> with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
> -
>
> Key: YARN-1414
> URL: https://issues.apache.org/jira/browse/YARN-1414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2628) Capacity scheduler with DominantResourceCalculator carries out reservation even though slots are free

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157311#comment-14157311
 ] 

Hudson commented on YARN-2628:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6183 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6183/])
YARN-2628. Capacity scheduler with DominantResourceCalculator carries out 
reservation even though slots are free. Contributed by Varun Vasudev (jianhe: 
rev 054f28552687e9b9859c0126e16a2066e20ead3f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt


> Capacity scheduler with DominantResourceCalculator carries out reservation 
> even though slots are free
> -
>
> Key: YARN-2628
> URL: https://issues.apache.org/jira/browse/YARN-2628
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.5.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.6.0
>
> Attachments: apache-yarn-2628.0.patch, apache-yarn-2628.1.patch
>
>
> We've noticed that if you run the CapacityScheduler with the 
> DominantResourceCalculator, sometimes apps will end up with containers in a 
> reserved state even though free slots are available.
> The root cause seems to be this piece of code from CapacityScheduler.java -
> {noformat}
> // Try to schedule more if there are no reservations to fulfill
> if (node.getReservedContainer() == null) {
>   if (Resources.greaterThanOrEqual(calculator, getClusterResource(),
>   node.getAvailableResource(), minimumAllocation)) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Trying to schedule on node: " + node.getNodeName() +
>   ", available: " + node.getAvailableResource());
> }
> root.assignContainers(clusterResource, node, false);
>   }
> } else {
>   LOG.info("Skipping scheduling since node " + node.getNodeID() + 
>   " is reserved by application " + 
>   
> node.getReservedContainer().getContainerId().getApplicationAttemptId()
>   );
> }
> {noformat}
> The code is meant to check if a node has any slots available for containers . 
> Since it uses the greaterThanOrEqual function, we end up in situation where 
> greaterThanOrEqual returns true, even though we may not have enough CPU or 
> memory to actually run the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157302#comment-14157302
 ] 

Steve Loughran commented on YARN-913:
-

Failing test is still the (believed unrelated) 
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 379.565 sec 
<<< FAILURE! - in 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
testDSRestartWithPreviousRunningContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 38.715 sec  <<< FAILURE!
java.lang.AssertionError: client failed
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSRestartWithPreviousRunningContainers(TestDistributedShell.java:319)

> Add a way to register long-lived services in a YARN cluster
> ---
>
> Key: YARN-913
> URL: https://issues.apache.org/jira/browse/YARN-913
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, resourcemanager
>Affects Versions: 2.5.0, 2.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
> 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
> YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
> YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
> YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
> YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
> YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
> YARN-913-016.patch, yarnregistry.pdf, yarnregistry.tla
>
>
> In a YARN cluster you can't predict where services will come up -or on what 
> ports. The services need to work those things out as they come up and then 
> publish them somewhere.
> Applications need to be able to find the service instance they are to bond to 
> -and not any others in the cluster.
> Some kind of service registry -in the RM, in ZK, could do this. If the RM 
> held the write access to the ZK nodes, it would be more secure than having 
> apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2598) GHS should show N/A instead of null for the inaccessible information

2014-10-02 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2598:
--
Attachment: YARN-2598.2.patch

Rebase the patch against the latest trunk

> GHS should show N/A instead of null for the inaccessible information
> 
>
> Key: YARN-2598
> URL: https://issues.apache.org/jira/browse/YARN-2598
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2598.1.patch, YARN-2598.2.patch
>
>
> When the user doesn't have the access to an application, the app attempt 
> information is not visible to the user. ClientRMService will output N/A, but 
> GHS is showing null, which is not user-friendly.
> {code}
> 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: 
> http://nn.example.com:8188/ws/v1/timeline/
> 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at 
> nn.example.com/240.0.0.11:8050
> 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History 
> server at nn.example.com/240.0.0.11:10200
> Application Report : 
>   Application-Id : application_1411586934799_0001
>   Application-Name : Sleep job
>   Application-Type : MAPREDUCE
>   User : hrt_qa
>   Queue : default
>   Start-Time : 1411586956012
>   Finish-Time : 1411586989169
>   Progress : 100%
>   State : FINISHED
>   Final-State : SUCCEEDED
>   Tracking-URL : null
>   RPC Port : -1
>   AM Host : null
>   Aggregate Resource Allocation : N/A
>   Diagnostics : null
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157296#comment-14157296
 ] 

Hadoop QA commented on YARN-2635:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672637/YARN-2635-2.patch
  against trunk revision 6ac1051.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5242//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5242//console

This message is automatically generated.

> TestRMRestart should run with all schedulers
> 
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch, YARN-2635-2.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157291#comment-14157291
 ] 

Hadoop QA commented on YARN-2468:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672626/YARN-2468.10.patch
  against trunk revision f679ca3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5244//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5244//console

This message is automatically generated.

> Log handling for LRS
> 
>
> Key: YARN-2468
> URL: https://issues.apache.org/jira/browse/YARN-2468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
> YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, 
> YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, 
> YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, 
> YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, 
> YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, 
> YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157286#comment-14157286
 ] 

Craig Welch commented on YARN-1680:
---

This does bring up what I think could be an issue, I'm not sure if it was what 
you were getting at before or not, [~john.jian.fang], but we could well be 
introducing a new bug here unless we are careful.  I don't see any connection 
between the scheduler level resource adjustments and the application level 
adjustments, so if an application had problems with a node and blacklisted it, 
and then the cluster did, the resource value of the node would be effectively 
removed from the headroom 2x (once when the application adds it to it's new 
"blacklist reduction", and a second time when the cluster removes it's value 
from the clusterResource).  I think this could be a problem, I think it could 
be addressed, but it's something to think about and I don't think the current 
approach addresses this- [~airbots], [~jlowe], thoughts?

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157275#comment-14157275
 ] 

Craig Welch commented on YARN-1680:
---

Blacklisting a node could happen because, for whatever reason, it's not able to 
run some application's code (missing libraries or whatnot) but the node may be 
viable for other applications, hence (I assume) the existence of application 
level blacklisting.

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157274#comment-14157274
 ] 

Ray Chiang commented on YARN-2635:
--

Oops, pending Jenkins of course.

> TestRMRestart should run with all schedulers
> 
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch, YARN-2635-2.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157271#comment-14157271
 ] 

Craig Welch commented on YARN-1680:
---

[~john.jian.fang] I should probably not have referred to the cluster level 
adjustments as "blacklisting".  What I see is a mechanism (state machine, 
events, including adding and removing nodes and the "unhealthy" state/the 
health monitor) that, I think, ultimately result in the 
CapacityScheduler.addNode() and removeNode() calls, which modify the 
clusterResource value.  In any case, the blacklisting functionality we are 
addressing here definitely looks to be application specific needs to be 
addressed at that level.  The issue isn't, so far as I know, related to any 
blacklisting/node health issues outside the one in play here, as those should 
work properly for headroom as they adjust the cluster resource.  The problem is 
that the application blacklist activity does not adjust the cluster resource 
and was previously not involved in the headroom calculation.  If it's not the 
case that cluster level adjustments are being made for nodes then this 
blacklisting will result in duplication among applications as they 
independently discover problems with nodes and blacklist them, but that is not 
a new characteristic of the way the system works.

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157269#comment-14157269
 ] 

Hudson commented on YARN-2527:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6182 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6182/])
YARN-2527. Fixed the potential NPE in ApplicationACLsManager and added test 
cases for it. Contributed by Benoy Antony. (zjshen: rev 
1c93025a1b370db46e345161dbc15e03f829823f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/security/ApplicationACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/server/security/TestApplicationACLsManager.java


> NPE in ApplicationACLsManager
> -
>
> Key: YARN-2527
> URL: https://issues.apache.org/jira/browse/YARN-2527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Fix For: 2.6.0
>
> Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch
>
>
> NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
> The relevant stacktrace snippet from the ResourceManager logs is as below
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {code}
> This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157265#comment-14157265
 ] 

Karthik Kambatla commented on YARN-1879:


Thanks for working on this, Tsuyoshi. Review comments on the latest patch:
# Are there cases when we don't want RetryCache enabled? IMO, we should always 
use the RetryCache (no harm). If we decide on having a config, the default 
should be true.
# I would set DEFAULT_RM_RETRY_CACHE_EXPIRY_MS to {{10 * 60 * 1000}} instead of 
60, and the corresponding comment (10 mins) can be removed or moved to the 
same line.
# TestApplicationMasterServiceRetryCache has a few lines longer than 80 chars. 


> Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
> ---
>
> Key: YARN-1879
> URL: https://issues.apache.org/jira/browse/YARN-1879
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
> YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
> YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
> YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.2-wip.patch, 
> YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
> YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-10-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157248#comment-14157248
 ] 

Craig Welch commented on YARN-1680:
---

[~airbots] thanks for your updated WIP patch - I've not looked at it 
extensively yet, but at first glance it looks good to me.  On the original 
patch I noticed that there seems to be a facility for blacklisting racks as 
well as nodes, and I was concerned that that needed to be addressed as well.  
It may be in this patch, but it did not look like it to me.  I do think it can 
be without too much difficulty - I think putting the additions (and removals) 
into sets and then checking to see if the node's rack is in the set during the 
node iteration would do the trick (I may be off here, but that looks like it 
would work to me.)

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157246#comment-14157246
 ] 

Ray Chiang commented on YARN-2635:
--

Tested TestRM/TestRMRestart/TestClientToAMTokens.  All three tests now pass 
cleanly using FairScheduler.  +1

> TestRMRestart should run with all schedulers
> 
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch, YARN-2635-2.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157245#comment-14157245
 ] 

Zhijie Shen commented on YARN-2527:
---

+1, will commit the patch

> NPE in ApplicationACLsManager
> -
>
> Key: YARN-2527
> URL: https://issues.apache.org/jira/browse/YARN-2527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch
>
>
> NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
> The relevant stacktrace snippet from the ResourceManager logs is as below
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {code}
> This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-10-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157234#comment-14157234
 ] 

Jason Lowe commented on YARN-2414:
--

Ran into this as well.  Any update, [~leftnoteasy]?

> RM web UI: app page will crash if app is failed before any attempt has been 
> created
> ---
>
> Key: YARN-2414
> URL: https://issues.apache.org/jira/browse/YARN-2414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Zhijie Shen
>Assignee: Wangda Tan
>
> {code}
> 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/app/application_1407887030038_0001
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>   at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>   at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.ja

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157217#comment-14157217
 ] 

Hadoop QA commented on YARN-1198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12672649/YARN-1198.11-with-1857.patch
  against trunk revision f679ca3.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5243//console

This message is automatically generated.

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.11-with-1857.patch, YARN-1198.11.patch, YARN-1198.2.patch, 
> YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, 
> YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--
Attachment: YARN-1198.11-with-1857.patch

Patch combining the last .11 with the latest 1857 patch, to make it easy to 
check them out together.  Tests changed/added for both issues are present and 
pass (unchanged)

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.11-with-1857.patch, YARN-1198.11.patch, YARN-1198.2.patch, 
> YARN-1198.3.patch, YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, 
> YARN-1198.7.patch, YARN-1198.8.patch, YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157199#comment-14157199
 ] 

Hadoop QA commented on YARN-2468:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672626/YARN-2468.10.patch
  against trunk revision a56f3ec.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5241//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5241//console

This message is automatically generated.

> Log handling for LRS
> 
>
> Key: YARN-2468
> URL: https://issues.apache.org/jira/browse/YARN-2468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
> YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, 
> YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, 
> YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, 
> YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, 
> YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, 
> YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2635:
--
Attachment: YARN-2635-2.patch

Update a patch which implements a bast class, which can be reused in future.

> TestRMRestart should run with all schedulers
> 
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch, YARN-2635-2.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-10-02 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157133#comment-14157133
 ] 

Xuan Gong commented on YARN-2468:
-

new patch addressed all the comments

> Log handling for LRS
> 
>
> Key: YARN-2468
> URL: https://issues.apache.org/jira/browse/YARN-2468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
> YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, 
> YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, 
> YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, 
> YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, 
> YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, 
> YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2468) Log handling for LRS

2014-10-02 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2468:

Attachment: YARN-2468.10.patch

> Log handling for LRS
> 
>
> Key: YARN-2468
> URL: https://issues.apache.org/jira/browse/YARN-2468
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-2468.1.patch, YARN-2468.10.patch, 
> YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.2.patch, 
> YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, YARN-2468.4.patch, 
> YARN-2468.5.1.patch, YARN-2468.5.1.patch, YARN-2468.5.2.patch, 
> YARN-2468.5.3.patch, YARN-2468.5.4.patch, YARN-2468.5.patch, 
> YARN-2468.6.1.patch, YARN-2468.6.patch, YARN-2468.7.1.patch, 
> YARN-2468.7.patch, YARN-2468.8.patch, YARN-2468.9.1.patch, YARN-2468.9.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-10-02 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-
Attachment: (was: YARN-2408.4.patch)

> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API (a JSON counterpart is also available):
> {code:xml}
> 
>   7680
>   7
>   
> application_1412191664217_0001
> 
> appattempt_1412191664217_0001_01
> default
> 6144
> 6
> 3
> 
>   
> 1024
> 1
> 6
> true
> 20
> 
>   localMachine
>   /default-rack
>   *
> 
>   
> 
>   
>   
>   ...
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-10-02 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-
Attachment: (was: YARN-2408-5.patch)

> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API (a JSON counterpart is also available):
> {code:xml}
> 
>   7680
>   7
>   
> application_1412191664217_0001
> 
> appattempt_1412191664217_0001_01
> default
> 6144
> 6
> 3
> 
>   
> 1024
> 1
> 6
> true
> 20
> 
>   localMachine
>   /default-rack
>   *
> 
>   
> 
>   
>   
>   ...
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-10-02 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-
Attachment: YARN-2408-5.patch

> Resource Request REST API for YARN
> --
>
> Key: YARN-2408
> URL: https://issues.apache.org/jira/browse/YARN-2408
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: webapp
>Reporter: Renan DelValle
>  Labels: features
> Attachments: YARN-2408-5.patch, YARN-2408.4.patch
>
>
> I’m proposing a new REST API for YARN which exposes a snapshot of the 
> Resource Requests that exist inside of the Scheduler. My motivation behind 
> this new feature is to allow external software to monitor the amount of 
> resources being requested to gain more insightful information into cluster 
> usage than is already provided. The API can also be used by external software 
> to detect a starved application and alert the appropriate users and/or sys 
> admin so that the problem may be remedied.
> Here is the proposed API (a JSON counterpart is also available):
> {code:xml}
> 
>   7680
>   7
>   
> application_1412191664217_0001
> 
> appattempt_1412191664217_0001_01
> default
> 6144
> 6
> 3
> 
>   
> 1024
> 1
> 6
> true
> 20
> 
>   localMachine
>   /default-rack
>   *
> 
>   
> 
>   
>   
>   ...
>   
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157083#comment-14157083
 ] 

Craig Welch commented on YARN-1198:
---

And, again, I think something is up with Jenkins, the patch application issue 
doesn't look to have anything to do with the patch, and all the builds are 
red...

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.11.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, 
> YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, 
> YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2634) Test failure for TestClientRMTokens

2014-10-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157079#comment-14157079
 ] 

Jian He commented on YARN-2634:
---

[~djp], I took latest trunk and ran locally, it actually passes. Would you mind 
checking again ? thx

> Test failure for TestClientRMTokens
> ---
>
> Key: YARN-2634
> URL: https://issues.apache.org/jira/browse/YARN-2634
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Blocker
>
> The test get failed as below:
> {noformat}
> ---
> Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
> ---
> Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
> testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 22.693 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272)
> testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 20.087 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283)
> testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.031 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148)
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241)
> testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.061 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261)
> testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.07 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684)
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149)
>   
>   
>1,1   Top
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157067#comment-14157067
 ] 

Hadoop QA commented on YARN-1198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672614/YARN-1198.11.patch
  against trunk revision a56f3ec.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5240//console

This message is automatically generated.

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.11.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, 
> YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, 
> YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-10-02 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1198:
--
Attachment: YARN-1198.11.patch

Attaching patch .11, this is based on .10 (nee .7), the preferred approach, 
with the a factoring change to decrease the impact - the HeadroomProvider is 
now limited to just the CapacityScheduler area / FiCaSchedulerApp.  It's 
actually possible to remove the HeadroomProvider altogether in favor of adding 
more members to the scheduler app, but I think it actually looks better 
factored this way (the functional result would be the same).

> Capacity Scheduler headroom calculation does not work as expected
> -
>
> Key: YARN-1198
> URL: https://issues.apache.org/jira/browse/YARN-1198
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Craig Welch
> Attachments: YARN-1198.1.patch, YARN-1198.10.patch, 
> YARN-1198.11.patch, YARN-1198.2.patch, YARN-1198.3.patch, YARN-1198.4.patch, 
> YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch, YARN-1198.8.patch, 
> YARN-1198.9.patch
>
>
> Today headroom calculation (for the app) takes place only when
> * New node is added/removed from the cluster
> * New container is getting assigned to the application.
> However there are potentially lot of situations which are not considered for 
> this calculation
> * If a container finishes then headroom for that application will change and 
> should be notified to the AM accordingly.
> * If a single user has submitted multiple applications (app1 and app2) to the 
> same queue then
> ** If app1's container finishes then not only app1's but also app2's AM 
> should be notified about the change in headroom.
> ** Similarly if a container is assigned to any applications app1/app2 then 
> both AM should be notified about their headroom.
> ** To simplify the whole communication process it is ideal to keep headroom 
> per User per LeafQueue so that everyone gets the same picture (apps belonging 
> to same user and submitted in same queue).
> * If a new user submits an application to the queue then all applications 
> submitted by all users in that queue should be notified of the headroom 
> change.
> * Also today headroom is an absolute number ( I think it should be normalized 
> but then this is going to be not backward compatible..)
> * Also  when admin user refreshes queue headroom has to be updated.
> These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2639) TestClientToAMTokens should run with all types of schedulers

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157012#comment-14157012
 ] 

Hadoop QA commented on YARN-2639:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672593/YARN-2639-2.patch
  against trunk revision 29f5200.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5239//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5239//console

This message is automatically generated.

> TestClientToAMTokens should run with all types of schedulers
> 
>
> Key: YARN-2639
> URL: https://issues.apache.org/jira/browse/YARN-2639
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2639-1.patch, YARN-2639-2.patch
>
>
> TestClientToAMTokens fails with FairScheduler now. We should let 
> TestClientToAMTokens run with all kinds of schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2639) TestClientToAMTokens should run with all types of schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2639.

Resolution: Duplicate

Can we fix this also as part of YARN-2635. 

> TestClientToAMTokens should run with all types of schedulers
> 
>
> Key: YARN-2639
> URL: https://issues.apache.org/jira/browse/YARN-2639
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2639-1.patch, YARN-2639-2.patch
>
>
> TestClientToAMTokens fails with FairScheduler now. We should let 
> TestClientToAMTokens run with all kinds of schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156963#comment-14156963
 ] 

Karthik Kambatla commented on YARN-2635:


Just saw YARN-2638 as well. On second thought, it might be better to club the 
two JIRAs and implement a base class for RM tests that run against all 
schedulers.

And, schedulerType in these tests should probably be an enum so subclasses 
don't have to know the order.

> TestRMRestart should run with all schedulers
> 
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156951#comment-14156951
 ] 

Jian He commented on YARN-2615:
---

looks good, only  few minor things:
- {{ClientToAMTokenIdentifierForTest}}, the same code overrides from 
{{ClientToAMTokenIdentifier}} may be removed ? similarly for 
{{RMDelegationTokenIdentifierForTest}}
-  this code can be removed.
{code}
byte[] tokenIdentifierContent = token.getIdentifier();
ClientToAMTokenIdentifier tokenIdentifier = new ClientToAMTokenIdentifier();
DataInputBuffer dib = new DataInputBuffer();
dib.reset(tokenIdentifierContent, tokenIdentifierContent.length);
tokenIdentifier.readFields(dib);
{code}



> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2615-v2.patch, YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2634) Test failure for TestClientRMTokens

2014-10-02 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-2634:
-

Assignee: Jian He

> Test failure for TestClientRMTokens
> ---
>
> Key: YARN-2634
> URL: https://issues.apache.org/jira/browse/YARN-2634
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Junping Du
>Assignee: Jian He
>Priority: Blocker
>
> The test get failed as below:
> {noformat}
> ---
> Test set: org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
> ---
> Tests run: 6, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 60.184 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
> testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 22.693 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:272)
> testShortCircuitRenewCancelDifferentHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 20.087 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostDifferentPort(TestClientRMTokens.java:283)
> testShortCircuitRenewCancel(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.031 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:148)
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.renew(RMDelegationTokenIdentifier.java:101)
> at org.apache.hadoop.security.token.Token.renew(Token.java:377)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:309)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancel(TestClientRMTokens.java:241)
> testShortCircuitRenewCancelSameHostDifferentPort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.061 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelSameHostDifferentPort(TestClientRMTokens.java:261)
> testShortCircuitRenewCancelWildcardAddress(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.07 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at org.apache.hadoop.net.NetUtils.isLocalAddress(NetUtils.java:684)
> at 
> org.apache.hadoop.yarn.security.client.RMDelegationTokenIdentifier$Renewer.getRmClient(RMDelegationTokenIdentifier.java:149)
>   
>   
>1,1   Top
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2638) TestRM should run with all schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2638:
---
Summary: TestRM should run with all schedulers  (was: Let TestRM run with 
all types of schedulers (FIFO, Capacity, Fair))

> TestRM should run with all schedulers
> -
>
> Key: YARN-2638
> URL: https://issues.apache.org/jira/browse/YARN-2638
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2638-1.patch
>
>
> TestRM fails when using FairScheduler or FifoScheduler. The failures not 
> shown in trunk as the trunk uses the default capacity scheduler. We need to 
> let TestRM run with all types of schedulers, to make sure any new change 
> wouldn't break any scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2635) TestRMRestart should run with all schedulers

2014-10-02 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2635:
---
Summary: TestRMRestart should run with all schedulers  (was: TestRMRestart 
fails with FairScheduler)

> TestRMRestart should run with all schedulers
> 
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156933#comment-14156933
 ] 

Karthik Kambatla commented on YARN-2635:


+1. Committing this. 

> TestRMRestart fails with FairScheduler
> --
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2180) In-memory backing store for cache manager

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156905#comment-14156905
 ] 

Karthik Kambatla commented on YARN-2180:


Looks mostly good, but for these minor comments:
# App-checker and the store implementations aren't related:
## the app-checker config should be appended to SHARED_CACHE_PREFIX and 
IN_MEMORY_STORE
## the variable names should be updated accordingly.
## InMemorySCMStore#createAppCheckerService should move to a util class - how 
about changing SharedCacheStructureUtil to SharedCacheUtil and adding this 
method there? 
# Can we create a follow-up blocker sub-task to revisit all the config names 
before we include sharedcache work in a release? 


> In-memory backing store for cache manager
> -
>
> Key: YARN-2180
> URL: https://issues.apache.org/jira/browse/YARN-2180
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, 
> YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, 
> YARN-2180-trunk-v6.patch
>
>
> Implement an in-memory backing store for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2639) TestClientToAMTokens should run with all types of schedulers

2014-10-02 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2639:
--
Attachment: YARN-2639-2.patch

re-trigger the jenkins

> TestClientToAMTokens should run with all types of schedulers
> 
>
> Key: YARN-2639
> URL: https://issues.apache.org/jira/browse/YARN-2639
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2639-1.patch, YARN-2639-2.patch
>
>
> TestClientToAMTokens fails with FairScheduler now. We should let 
> TestClientToAMTokens run with all kinds of schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-02 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156892#comment-14156892
 ] 

Siqi Li commented on YARN-1414:
---

I just found out that this problem has been fixed in the trunk. I am going to 
close this jira

> with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
> -
>
> Key: YARN-1414
> URL: https://issues.apache.org/jira/browse/YARN-1414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Fix For: 2.2.0
>
> Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156890#comment-14156890
 ] 

Hadoop QA commented on YARN-2527:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672583/YARN-2527.patch
  against trunk revision 5e0b49d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5238//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5238//console

This message is automatically generated.

> NPE in ApplicationACLsManager
> -
>
> Key: YARN-2527
> URL: https://issues.apache.org/jira/browse/YARN-2527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch
>
>
> NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
> The relevant stacktrace snippet from the ResourceManager logs is as below
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {code}
> This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) TestRMWebServicesAppsModification should run against both CS and FS

2014-10-02 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156888#comment-14156888
 ] 

zhihai xu commented on YARN-2254:
-

thanks [~kasha] for reviewing and committing the patch.

> TestRMWebServicesAppsModification should run against both CS and FS
> ---
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Fix For: 2.7.0
>
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-02 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156869#comment-14156869
 ] 

Anubhav Dhoot commented on YARN-2624:
-

Thanks [~jlowe]!

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2624.001.patch, YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2638) Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)

2014-10-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156867#comment-14156867
 ] 

Ray Chiang commented on YARN-2638:
--

This patch fixes the test for me.  +1

> Let TestRM run with all types of schedulers (FIFO, Capacity, Fair)
> --
>
> Key: YARN-2638
> URL: https://issues.apache.org/jira/browse/YARN-2638
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2638-1.patch
>
>
> TestRM fails when using FairScheduler or FifoScheduler. The failures not 
> shown in trunk as the trunk uses the default capacity scheduler. We need to 
> let TestRM run with all types of schedulers, to make sure any new change 
> wouldn't break any scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156841#comment-14156841
 ] 

Hudson commented on YARN-2624:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6178 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6178/])
YARN-2624. Resource Localization fails on a cluster due to existing cache 
directories. Contributed by Anubhav Dhoot (jlowe: rev 
29f520052e2b02f44979980e446acc0dccd96d54)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java


> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-2624.001.patch, YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156835#comment-14156835
 ] 

Karthik Kambatla commented on YARN-2624:


Thanks for super-quick turnaround, Jason. 

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-2624.001.patch, YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler

2014-10-02 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156836#comment-14156836
 ] 

Ray Chiang commented on YARN-2635:
--

Looks good to me.  Ran cleanly in my tree.  +1

> TestRMRestart fails with FairScheduler
> --
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-02 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156824#comment-14156824
 ] 

Jason Lowe commented on YARN-2624:
--

Thanks for catching and fixing this, Anubhav!  My apologies for missing this 
scenario in the original JIRA.

+1 lgtm.  Committing this.


> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-2624.001.patch, YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1414) with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs

2014-10-02 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156814#comment-14156814
 ] 

Siqi Li commented on YARN-1414:
---

Sure, I will submit a rebased patch shortly.

> with Fair Scheduler reserved MB in WebUI is leaking when killing waiting jobs
> -
>
> Key: YARN-1414
> URL: https://issues.apache.org/jira/browse/YARN-1414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Fix For: 2.2.0
>
> Attachments: YARN-1221-subtask.v1.patch.txt, YARN-1221-v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated YARN-2527:
---
Attachment: YARN-2527.patch

> NPE in ApplicationACLsManager
> -
>
> Key: YARN-2527
> URL: https://issues.apache.org/jira/browse/YARN-2527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch
>
>
> NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
> The relevant stacktrace snippet from the ResourceManager logs is as below
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {code}
> This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2254) TestRMWebServicesAppsModification should run against both CS and FS

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156803#comment-14156803
 ] 

Hudson commented on YARN-2254:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6177 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6177/])
YARN-2254. TestRMWebServicesAppsModification should run against both CS and FS. 
(Zhihai Xu via kasha) (kasha: rev 5e0b49da9caa53814581508e589f3704592cf335)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java


> TestRMWebServicesAppsModification should run against both CS and FS
> ---
>
> Key: YARN-2254
> URL: https://issues.apache.org/jira/browse/YARN-2254
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
>  Labels: test
> Fix For: 2.7.0
>
> Attachments: YARN-2254.000.patch, YARN-2254.001.patch, 
> YARN-2254.002.patch, YARN-2254.003.patch, YARN-2254.004.patch
>
>
> TestRMWebServicesAppsModification skips the test, if the scheduler is not 
> CapacityScheduler.
> change TestRMWebServicesAppsModification to support both CapacityScheduler 
> and FairScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2624) Resource Localization fails on a cluster due to existing cache directories

2014-10-02 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156805#comment-14156805
 ] 

Karthik Kambatla commented on YARN-2624:


The patch looks good to me. Would like input from someone more familiar with 
the NM restart code. [~jlowe], [~djp] - can either of you take a look? We would 
like to get this committed soon. 

> Resource Localization fails on a cluster due to existing cache directories
> --
>
> Key: YARN-2624
> URL: https://issues.apache.org/jira/browse/YARN-2624
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.1
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Blocker
> Attachments: YARN-2624.001.patch, YARN-2624.001.patch
>
>
> We have found resource localization fails on a cluster with following error 
> in certain cases.
> {noformat}
> INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Failed to download rsrc { { 
> hdfs://:8020/tmp/hive-hive/hive_2014-09-29_14-55-45_184_6531377394813896912-12/-mr-10004/95a07b90-2448-48fc-bcda-cdb7400b4975/map.xml,
>  1412027745352, FILE, null 
> },pending,[(container_1411670948067_0009_02_01)],443533288192637,DOWNLOADING}
> java.io.IOException: Rename cannot overwrite non empty destination directory 
> /data/yarn/nm/filecache/27
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.renameInternal(AbstractFileSystem.java:716)
>   at org.apache.hadoop.fs.FilterFs.renameInternal(FilterFs.java:228)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.rename(AbstractFileSystem.java:659)
>   at org.apache.hadoop.fs.FileContext.rename(FileContext.java:906)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:366)
>   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated YARN-2527:
---
Attachment: (was: YARN-2527.patch)

> NPE in ApplicationACLsManager
> -
>
> Key: YARN-2527
> URL: https://issues.apache.org/jira/browse/YARN-2527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: YARN-2527.patch, YARN-2527.patch
>
>
> NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
> The relevant stacktrace snippet from the ResourceManager logs is as below
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {code}
> This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2527) NPE in ApplicationACLsManager

2014-10-02 Thread Benoy Antony (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoy Antony updated YARN-2527:
---
Attachment: YARN-2527.patch

Thanks for the code, [~zjshen]. 
I have updated the patch based on the comment.

> NPE in ApplicationACLsManager
> -
>
> Key: YARN-2527
> URL: https://issues.apache.org/jira/browse/YARN-2527
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: YARN-2527.patch, YARN-2527.patch, YARN-2527.patch
>
>
> NPE in _ApplicationACLsManager_ can result in 500 Internal Server Error.
> The relevant stacktrace snippet from the ResourceManager logs is as below
> {code}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.security.ApplicationACLsManager.checkAccess(ApplicationACLsManager.java:104)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:101)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> {code}
> This issue was reported by [~miguenther].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156772#comment-14156772
 ] 

Hudson commented on YARN-2617:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6176 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6176/])
YARN-2617. Fixed NM to not send duplicate container status whose app is not 
running. Contributed by Jun Gong (jianhe: rev 
3ef1cf187faeb530e74606dd7113fd1ba08140d7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, 
> YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2617) NM does not need to send finished container whose APP is not running to RM

2014-10-02 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156761#comment-14156761
 ] 

Jian He commented on YARN-2617:
---

YARN-2640 seems resolved in  YARN-1979 already.

> NM does not need to send finished container whose APP is not running to RM
> --
>
> Key: YARN-2617
> URL: https://issues.apache.org/jira/browse/YARN-2617
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2617.2.patch, YARN-2617.3.patch, YARN-2617.4.patch, 
> YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.5.patch, YARN-2617.6.patch, 
> YARN-2617.patch
>
>
> We([~chenchun]) are testing RM work preserving restart and found the 
> following logs when we ran a simple MapReduce task "PI". NM continuously 
> reported completed containers whose Application had already finished while AM 
> had finished. 
> {code}
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:42,228 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:43,230 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> 2014-09-26 17:00:44,233 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {code}
> In the patch for YARN-1372, ApplicationImpl on NM should guarantee to  clean 
> up already completed applications. But it will only remove appId from  
> 'app.context.getApplications()' when ApplicaitonImpl received evnet 
> 'ApplicationEventType.APPLICATION_LOG_HANDLING_FINISHED' , however NM might 
> receive this event for a long time or could not receive. 
> * For NonAggregatingLogHandler, it wait for 
> YarnConfiguration.NM_LOG_RETAIN_SECONDS which is 3 * 60 * 60 sec by default, 
> then it will be scheduled to delete Application logs and send the event.
> * For LogAggregationService, it might fail(e.g. if user does not have HDFS 
> write permission), and it will not send the event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156655#comment-14156655
 ] 

Tsuyoshi OZAWA commented on YARN-2615:
--

[~djp], currently, maybe the build about YARN looks broken on Jenkins CI. I 
faced same issue on YARN-2562.

> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2615-v2.patch, YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156651#comment-14156651
 ] 

Hadoop QA commented on YARN-2615:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672553/YARN-2615-v2.patch
  against trunk revision c7cee9b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5237//console

This message is automatically generated.

> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2615-v2.patch, YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156653#comment-14156653
 ] 

Tsuyoshi OZAWA commented on YARN-1979:
--

Thanks Vinod for the contribution and Junping for the review!

> TestDirectoryCollection fails when the umask is unusual
> ---
>
> Key: YARN-1979
> URL: https://issues.apache.org/jira/browse/YARN-1979
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Fix For: 2.7.0
>
> Attachments: YARN-1979.2.patch, YARN-1979.txt
>
>
> I've seen this fail in Windows where the default permissions are matching up 
> to 700.
> {code}
> ---
> Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> ---
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
>   Time elapsed: 0.422 sec  <<< FAILURE!
> java.lang.AssertionError: local dir parent 
> Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA
>  not created with proper permissions expected: but was:
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106)
> {code}
> The clash is between testDiskSpaceUtilizationLimit() and 
> testCreateDirectories().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156647#comment-14156647
 ] 

Hudson commented on YARN-1979:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6174 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6174/])
YARN-1979. TestDirectoryCollection fails when the umask is unusual. 
(Contributed by Vinod Kumar Vavilapalli and Tsuyoshi OZAWA) (junping_du: rev 
c7cee9b4551918d5d35bf4e9dc73982a050c73ba)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDirectoryCollection.java


> TestDirectoryCollection fails when the umask is unusual
> ---
>
> Key: YARN-1979
> URL: https://issues.apache.org/jira/browse/YARN-1979
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Fix For: 2.7.0
>
> Attachments: YARN-1979.2.patch, YARN-1979.txt
>
>
> I've seen this fail in Windows where the default permissions are matching up 
> to 700.
> {code}
> ---
> Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> ---
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
>   Time elapsed: 0.422 sec  <<< FAILURE!
> java.lang.AssertionError: local dir parent 
> Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA
>  not created with proper permissions expected: but was:
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106)
> {code}
> The clash is between testDiskSpaceUtilizationLimit() and 
> testCreateDirectories().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-02 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2615:
-
Attachment: YARN-2615-v2.patch

In v2 patch,
- Fix test failures and audit warning.
- Add more tests for RMDelegationToken and TimelineDelegationToken.

> ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
> fields
> 
>
> Key: YARN-2615
> URL: https://issues.apache.org/jira/browse/YARN-2615
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-2615-v2.patch, YARN-2615.patch
>
>
> As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
> and DelegationTokenIdentifier should also be updated in the same way to allow 
> fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual

2014-10-02 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156618#comment-14156618
 ] 

Junping Du commented on YARN-1979:
--

Thanks [~ozawa] for reminding me on this. Yes. I do forget this JIRA.
+1. Committing it now. 

> TestDirectoryCollection fails when the umask is unusual
> ---
>
> Key: YARN-1979
> URL: https://issues.apache.org/jira/browse/YARN-1979
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1979.2.patch, YARN-1979.txt
>
>
> I've seen this fail in Windows where the default permissions are matching up 
> to 700.
> {code}
> ---
> Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> ---
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
>   Time elapsed: 0.422 sec  <<< FAILURE!
> java.lang.AssertionError: local dir parent 
> Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA
>  not created with proper permissions expected: but was:
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106)
> {code}
> The clash is between testDiskSpaceUtilizationLimit() and 
> testCreateDirectories().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156543#comment-14156543
 ] 

Hudson commented on YARN-2630:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
YARN-2630. Prevented previous AM container status from being acquired by the 
current restarted AM. Contributed by Jian He. (zjshen: rev 
52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto


> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
> YARN-2630.4.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1063) Winutils needs ability to create task as domain user

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156552#comment-14156552
 ] 

Hudson commented on YARN-1063:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
YARN-1063. Augmented Hadoop common winutils to have the ability to create 
containers as domain users. Contributed by Remus Rusanu. (vinodkv: rev 
5ca97f1e60b8a7848f6eadd15f6c08ed390a8cda)
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestWinUtils.java
* hadoop-common-project/hadoop-common/src/main/winutils/chown.c
* hadoop-common-project/hadoop-common/src/main/winutils/symlink.c
* hadoop-common-project/hadoop-common/src/main/winutils/libwinutils.c
* hadoop-common-project/hadoop-common/src/main/winutils/include/winutils.h
* hadoop-common-project/hadoop-common/src/main/winutils/task.c
* hadoop-yarn-project/CHANGES.txt


> Winutils needs ability to create task as domain user
> 
>
> Key: YARN-1063
> URL: https://issues.apache.org/jira/browse/YARN-1063
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
> Environment: Windows
>Reporter: Kyle Leckie
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: YARN-1063.2.patch, YARN-1063.3.patch, YARN-1063.4.patch, 
> YARN-1063.5.patch, YARN-1063.6.patch, YARN-1063.patch
>
>
> h1. Summary:
> Securing a Hadoop cluster requires constructing some form of security 
> boundary around the processes executed in YARN containers. Isolation based on 
> Windows user isolation seems most feasible. This approach is similar to the 
> approach taken by the existing LinuxContainerExecutor. The current patch to 
> winutils.exe adds the ability to create a process as a domain user. 
> h1. Alternative Methods considered:
> h2. Process rights limited by security token restriction:
> On Windows access decisions are made by examining the security token of a 
> process. It is possible to spawn a process with a restricted security token. 
> Any of the rights granted by SIDs of the default token may be restricted. It 
> is possible to see this in action by examining the security tone of a 
> sandboxed process launch be a web browser. Typically the launched process 
> will have a fully restricted token and need to access machine resources 
> through a dedicated broker process that enforces a custom security policy. 
> This broker process mechanism would break compatibility with the typical 
> Hadoop container process. The Container process must be able to utilize 
> standard function calls for disk and network IO. I performed some work 
> looking at ways to ACL the local files to the specific launched without 
> granting rights to other processes launched on the same machine but found 
> this to be an overly complex solution. 
> h2. Relying on APP containers:
> Recent versions of windows have the ability to launch processes within an 
> isolated container. Application containers are supported for execution of 
> WinRT based executables. This method was ruled out due to the lack of 
> official support for standard windows APIs. At some point in the future 
> windows may support functionality similar to BSD jails or Linux containers, 
> at that point support for containers should be added.
> h1. Create As User Feature Description:
> h2. Usage:
> A new sub command was added to the set of task commands. Here is the syntax:
> winutils task createAsUser [TASKNAME] [USERNAME] [COMMAND_LINE]
> Some notes:
> * The username specified is in the format of "user@domain"
> * The machine executing this command must be joined to the domain of the user 
> specified
> * The domain controller must allow the account executing the command access 
> to the user information. For this join the account to the predefined group 
> labeled "Pre-Windows 2000 Compatible Access"
> * The account running the command must have several rights on the local 
> machine. These can be managed manually using secpol.msc: 
> ** "Act as part of the operating system" - SE_TCB_NAME
> ** "Replace a process-level token" - SE_ASSIGNPRIMARYTOKEN_NAME
> ** "Adjust memory quotas for a process" - SE_INCREASE_QUOTA_NAME
> * The launched process will not have rights to the desktop so will not be 
> able to display any information or create UI.
> * The launched process will have no network credentials. Any access of 
> network resources that requires domain authentication will fail.
> h2. Implementation:
> Winutils performs the following steps:
> # Enable the required privileges for the current process.
> # Register as a trusted process with the Local Security Authority (LSA).
> # Create a new logon for the user passed on the command line.
> 

[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156537#comment-14156537
 ] 

Hudson commented on YARN-1972:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
YARN-1972. Added a secure container-executor for Windows. Contributed by Remus 
Rusanu. (vinodkv: rev ba7f31c2ee8d23ecb183f88920ef06053c0b9769)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java


> Implement secure Windows Container Executor
> ---
>
> Key: YARN-1972
> URL: https://issues.apache.org/jira/browse/YARN-1972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
> YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, 
> YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch
>
>
> h1. Windows Secure Container Executor (WCE)
> YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
> user as a solution for the problem of having a security boundary between 
> processes executed in YARN containers and the Hadoop services. The WCE is a 
> container executor that leverages the winutils capabilities introduced in 
> YARN-1063 and launches containers as an OS process running as the job 
> submitter user. A description of the S4U infrastructure used by YARN-1063 
> alternatives considered can be read on that JIRA.
> The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
> drive the flow of execution, but it overwrrides some emthods to the effect of:
> * change the DCE created user cache directories to be owned by the job user 
> and by the nodemanager group.
> * changes the actual container run command to use the 'createAsUser' command 
> of winutils task instead of 'create'
> * runs the localization as standalone process instead of an in-process Java 
> method call. This in turn relies on the winutil createAsUser feature to run 
> the localization as the job user.
>  
> When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
> differences:
> * it does no delegate the creation of the user cache directories to the 
> native implementation.
> * it does no require special handling to be able to delete user files
> The approach on the WCE came from a practical trial-and-error approach. I had 
> to iron out some issues around the Windows script shell limitations (command 
> line length) to get it to work, the biggest issue being the huge CLASSPATH 
> that is commonplace in Hadoop environment container executions. The job 
> container itself is already dealing with this via a so called 'classpath 
> jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
> as a separate container the same issue had to be resolved and I used the same 
> 'classpath jar' approach.
> h2. Deployment Requirements
> To use the WCE one needs to set the 
> `yarn.nodemanager.container-executor.class` to 
> `org.

[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156524#comment-14156524
 ] 

Hudson commented on YARN-2613:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
YARN-2613. Support retry in NMClient for rolling-upgrades. (Contributed by Jian 
He) (junping_du: rev 0708827a935d190d439854e08bb5a655d7daa606)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java


> NMClient doesn't have retries for supporting rolling-upgrades
> -
>
> Key: YARN-2613
> URL: https://issues.apache.org/jira/browse/YARN-2613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch
>
>
> While NM is rolling upgrade, client should retry NM until it comes up. This 
> jira is to add a NMProxy (similar to RMProxy) with retry implementation to 
> support rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156528#comment-14156528
 ] 

Hudson commented on YARN-2446:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1914 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1914/])
YARN-2446. Augmented Timeline service APIs to start taking in domains as a 
parameter while posting entities and events. Contributed by Zhijie Shen. 
(vinodkv: rev 9e40de6af7959ac7bb5f4e4d2833ca14ea457614)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/timeline/TestTimelineRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java


> Using TimelineNamespace to shield the entities of a user
> 
>
> Key: YARN-2446
> URL: https://issues.apache.org/jira/browse/YARN-2446
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch
>
>
> Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the 
> entities, preventing them from being accessed or affected by other users' 
> operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRMRestart fails with FairScheduler

2014-10-02 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156490#comment-14156490
 ] 

Wei Yan commented on YARN-2635:
---

All tests passed locally. The TestDirectoryCollection failure looks related to 
YARN-1979, YARN-2640.

> TestRMRestart fails with FairScheduler
> --
>
> Key: YARN-2635
> URL: https://issues.apache.org/jira/browse/YARN-2635
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2635-1.patch
>
>
> If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
> TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156471#comment-14156471
 ] 

Hadoop QA commented on YARN-2562:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672538/YARN-2562.5-2.patch
  against trunk revision 9e40de6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5236//console

This message is automatically generated.

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
> YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156462#comment-14156462
 ] 

Tsuyoshi OZAWA commented on YARN-1979:
--

[~djp], do you mind taking a look at latest patch? Some users report same issue 
like YARN-2640.

> TestDirectoryCollection fails when the umask is unusual
> ---
>
> Key: YARN-1979
> URL: https://issues.apache.org/jira/browse/YARN-1979
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1979.2.patch, YARN-1979.txt
>
>
> I've seen this fail in Windows where the default permissions are matching up 
> to 700.
> {code}
> ---
> Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> ---
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
>   Time elapsed: 0.422 sec  <<< FAILURE!
> java.lang.AssertionError: local dir parent 
> Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA
>  not created with proper permissions expected: but was:
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106)
> {code}
> The clash is between testDiskSpaceUtilizationLimit() and 
> testCreateDirectories().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2640) TestDirectoryCollection.testCreateDirectories failed

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156461#comment-14156461
 ] 

Tsuyoshi OZAWA commented on YARN-2640:
--

[~hex108], thanks for your contribution. Can we close this jira as duplicated 
issue of YARN-1979?

> TestDirectoryCollection.testCreateDirectories failed
> 
>
> Key: YARN-2640
> URL: https://issues.apache.org/jira/browse/YARN-2640
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-2640.2.patch, YARN-2640.patch
>
>
> When running test "mvn test -Dtest=TestDirectoryCollection", it failed:
> {code}
> Running org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.538 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
>   Time elapsed: 0.969 sec  <<< FAILURE!
> java.lang.AssertionError: local dir parent not created with proper 
> permissions expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:104)
> {code}
> I found it was because testDiskSpaceUtilizationLimit ran before 
> testCreateDirectories when running test, then directory "dirA" was created in 
> test function testDiskSpaceUtilizationLimit. When testCreateDirectories tried 
> to create "dirA" with specified permission, it found "dirA" has already been 
> there and it did nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2562) ContainerId@toString() is unreadable for epoch >0 after YARN-2182

2014-10-02 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2562:
-
Attachment: YARN-2562.5-2.patch

> ContainerId@toString() is unreadable for epoch >0 after YARN-2182
> -
>
> Key: YARN-2562
> URL: https://issues.apache.org/jira/browse/YARN-2562
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Tsuyoshi OZAWA
>Priority: Critical
> Attachments: YARN-2562.1.patch, YARN-2562.2.patch, YARN-2562.3.patch, 
> YARN-2562.4.patch, YARN-2562.5-2.patch, YARN-2562.5.patch
>
>
> ContainerID string format is unreadable for RMs that restarted at least once 
> (epoch > 0) after YARN-2182. For e.g, 
> container_1410901177871_0001_01_05_17.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156412#comment-14156412
 ] 

Hudson commented on YARN-2446:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/])
YARN-2446. Augmented Timeline service APIs to start taking in domains as a 
parameter while posting entities and events. Contributed by Zhijie Shen. 
(vinodkv: rev 9e40de6af7959ac7bb5f4e4d2833ca14ea457614)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/timeline/TestTimelineRecords.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelinePutResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TestLeveldbTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/TimelineStoreTestUtils.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/TimelineDataManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/security/TimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineACLsManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/MemoryTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/timeline/TimelineEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/timeline/LeveldbTimelineStore.java


> Using TimelineNamespace to shield the entities of a user
> 
>
> Key: YARN-2446
> URL: https://issues.apache.org/jira/browse/YARN-2446
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: 2.6.0
>
> Attachments: YARN-2446.1.patch, YARN-2446.2.patch, YARN-2446.3.patch
>
>
> Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the 
> entities, preventing them from being accessed or affected by other users' 
> operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2613) NMClient doesn't have retries for supporting rolling-upgrades

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156408#comment-14156408
 ] 

Hudson commented on YARN-2613:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/])
YARN-2613. Support retry in NMClient for rolling-upgrades. (Contributed by Jian 
He) (junping_du: rev 0708827a935d190d439854e08bb5a655d7daa606)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestNMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ServerProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/ContainerManagementProtocolProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/NMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* hadoop-yarn-project/CHANGES.txt


> NMClient doesn't have retries for supporting rolling-upgrades
> -
>
> Key: YARN-2613
> URL: https://issues.apache.org/jira/browse/YARN-2613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2613.1.patch, YARN-2613.2.patch, YARN-2613.3.patch
>
>
> While NM is rolling upgrade, client should retry NM until it comes up. This 
> jira is to add a NMProxy (similar to RMProxy) with retry implementation to 
> support rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156421#comment-14156421
 ] 

Hudson commented on YARN-1972:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/])
YARN-1972. Added a secure container-executor for Windows. Contributed by Remus 
Rusanu. (vinodkv: rev ba7f31c2ee8d23ecb183f88920ef06053c0b9769)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ContainerLocalizer.java
* hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/WindowsSecureContainerExecutor.java


> Implement secure Windows Container Executor
> ---
>
> Key: YARN-1972
> URL: https://issues.apache.org/jira/browse/YARN-1972
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
> YARN-1972.delta.4.patch, YARN-1972.delta.5-branch-2.patch, 
> YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch
>
>
> h1. Windows Secure Container Executor (WCE)
> YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
> user as a solution for the problem of having a security boundary between 
> processes executed in YARN containers and the Hadoop services. The WCE is a 
> container executor that leverages the winutils capabilities introduced in 
> YARN-1063 and launches containers as an OS process running as the job 
> submitter user. A description of the S4U infrastructure used by YARN-1063 
> alternatives considered can be read on that JIRA.
> The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
> drive the flow of execution, but it overwrrides some emthods to the effect of:
> * change the DCE created user cache directories to be owned by the job user 
> and by the nodemanager group.
> * changes the actual container run command to use the 'createAsUser' command 
> of winutils task instead of 'create'
> * runs the localization as standalone process instead of an in-process Java 
> method call. This in turn relies on the winutil createAsUser feature to run 
> the localization as the job user.
>  
> When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
> differences:
> * it does no delegate the creation of the user cache directories to the 
> native implementation.
> * it does no require special handling to be able to delete user files
> The approach on the WCE came from a practical trial-and-error approach. I had 
> to iron out some issues around the Windows script shell limitations (command 
> line length) to get it to work, the biggest issue being the huge CLASSPATH 
> that is commonplace in Hadoop environment container executions. The job 
> container itself is already dealing with this via a so called 'classpath 
> jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
> as a separate container the same issue had to be resolved and I used the same 
> 'classpath jar' approach.
> h2. Deployment Requirements
> To use the WCE one needs to set the 
> `yarn.nodemanager.container-executor.class` to 
> `org.apache.had

[jira] [Commented] (YARN-2630) TestDistributedShell#testDSRestartWithPreviousRunningContainers fails

2014-10-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156427#comment-14156427
 ] 

Hudson commented on YARN-2630:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1889/])
YARN-2630. Prevented previous AM container status from being acquired by the 
current restarted AM. Contributed by Jian He. (zjshen: rev 
52bbe0f11bc8e97df78a1ab9b63f4eff65fd7a76)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


> TestDistributedShell#testDSRestartWithPreviousRunningContainers fails
> -
>
> Key: YARN-2630
> URL: https://issues.apache.org/jira/browse/YARN-2630
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Fix For: 2.6.0
>
> Attachments: YARN-2630.1.patch, YARN-2630.2.patch, YARN-2630.3.patch, 
> YARN-2630.4.patch
>
>
> The problem is that after YARN-1372, in work-preserving AM restart, the 
> re-launched AM will also receive previously failed AM container. But 
> DistributedShell logic is not expecting this extra completed container.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >