[jira] [Updated] (YARN-1250) Generic history service should support application-acls

2014-09-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1250:
--
Attachment: YARN-1250.4.patch

Upload a new patch to fix the problem of publishing the app's ACLs information.

 Generic history service should support application-acls
 ---

 Key: YARN-1250
 URL: https://issues.apache.org/jira/browse/YARN-1250
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: GenericHistoryACLs.pdf, YARN-1250.1.patch, 
 YARN-1250.2.patch, YARN-1250.3.patch, YARN-1250.4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1250) Generic history service should support application-acls

2014-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133689#comment-14133689
 ] 

Hadoop QA commented on YARN-1250:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668720/YARN-1250.4.patch
  against trunk revision fc741b5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4956//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4956//console

This message is automatically generated.

 Generic history service should support application-acls
 ---

 Key: YARN-1250
 URL: https://issues.apache.org/jira/browse/YARN-1250
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: GenericHistoryACLs.pdf, YARN-1250.1.patch, 
 YARN-1250.2.patch, YARN-1250.3.patch, YARN-1250.4.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor

2014-09-15 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-1972:
---
Attachment: YARN-1972.delta.5.patch

Fix the LCE user vs. runAsUser in startLocalizer

 Implement secure Windows Container Executor
 ---

 Key: YARN-1972
 URL: https://issues.apache.org/jira/browse/YARN-1972
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
 YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch


 h1. Windows Secure Container Executor (WCE)
 YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
 user as a solution for the problem of having a security boundary between 
 processes executed in YARN containers and the Hadoop services. The WCE is a 
 container executor that leverages the winutils capabilities introduced in 
 YARN-1063 and launches containers as an OS process running as the job 
 submitter user. A description of the S4U infrastructure used by YARN-1063 
 alternatives considered can be read on that JIRA.
 The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
 drive the flow of execution, but it overwrrides some emthods to the effect of:
 * change the DCE created user cache directories to be owned by the job user 
 and by the nodemanager group.
 * changes the actual container run command to use the 'createAsUser' command 
 of winutils task instead of 'create'
 * runs the localization as standalone process instead of an in-process Java 
 method call. This in turn relies on the winutil createAsUser feature to run 
 the localization as the job user.
  
 When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
 differences:
 * it does no delegate the creation of the user cache directories to the 
 native implementation.
 * it does no require special handling to be able to delete user files
 The approach on the WCE came from a practical trial-and-error approach. I had 
 to iron out some issues around the Windows script shell limitations (command 
 line length) to get it to work, the biggest issue being the huge CLASSPATH 
 that is commonplace in Hadoop environment container executions. The job 
 container itself is already dealing with this via a so called 'classpath 
 jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
 as a separate container the same issue had to be resolved and I used the same 
 'classpath jar' approach.
 h2. Deployment Requirements
 To use the WCE one needs to set the 
 `yarn.nodemanager.container-executor.class` to 
 `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
 and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
 Windows security group name that is the nodemanager service principal is a 
 member of (equivalent of LCE 
 `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
 does not require any configuration outside of the Hadoop own's yar-site.xml.
 For WCE to work the nodemanager must run as a service principal that is 
 member of the local Administrators group or LocalSystem. this is derived from 
 the need to invoke LoadUserProfile API which mention these requirements in 
 the specifications. This is in addition to the SE_TCB privilege mentioned in 
 YARN-1063, but this requirement will automatically imply that the SE_TCB 
 privilege is held by the nodemanager. For the Linux speakers in the audience, 
 the requirement is basically to run NM as root.
 h2. Dedicated high privilege Service
 Due to the high privilege required by the WCE we had discussed the need to 
 isolate the high privilege operations into a separate process, an 'executor' 
 service that is solely responsible to start the containers (incloding the 
 localizer). The NM would have to authenticate, authorize and communicate with 
 this service via an IPC mechanism and use this service to launch the 
 containers. I still believe we'll end up deploying such a service, but the 
 effort to onboard such a new platfrom specific new service on the project are 
 not trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor

2014-09-15 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-1972:
---
Attachment: YARN-1972.trunk.5.patch

trunk diff corresponding to .delta.5

 Implement secure Windows Container Executor
 ---

 Key: YARN-1972
 URL: https://issues.apache.org/jira/browse/YARN-1972
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
 YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, 
 YARN-1972.trunk.5.patch


 h1. Windows Secure Container Executor (WCE)
 YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
 user as a solution for the problem of having a security boundary between 
 processes executed in YARN containers and the Hadoop services. The WCE is a 
 container executor that leverages the winutils capabilities introduced in 
 YARN-1063 and launches containers as an OS process running as the job 
 submitter user. A description of the S4U infrastructure used by YARN-1063 
 alternatives considered can be read on that JIRA.
 The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
 drive the flow of execution, but it overwrrides some emthods to the effect of:
 * change the DCE created user cache directories to be owned by the job user 
 and by the nodemanager group.
 * changes the actual container run command to use the 'createAsUser' command 
 of winutils task instead of 'create'
 * runs the localization as standalone process instead of an in-process Java 
 method call. This in turn relies on the winutil createAsUser feature to run 
 the localization as the job user.
  
 When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
 differences:
 * it does no delegate the creation of the user cache directories to the 
 native implementation.
 * it does no require special handling to be able to delete user files
 The approach on the WCE came from a practical trial-and-error approach. I had 
 to iron out some issues around the Windows script shell limitations (command 
 line length) to get it to work, the biggest issue being the huge CLASSPATH 
 that is commonplace in Hadoop environment container executions. The job 
 container itself is already dealing with this via a so called 'classpath 
 jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
 as a separate container the same issue had to be resolved and I used the same 
 'classpath jar' approach.
 h2. Deployment Requirements
 To use the WCE one needs to set the 
 `yarn.nodemanager.container-executor.class` to 
 `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
 and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
 Windows security group name that is the nodemanager service principal is a 
 member of (equivalent of LCE 
 `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
 does not require any configuration outside of the Hadoop own's yar-site.xml.
 For WCE to work the nodemanager must run as a service principal that is 
 member of the local Administrators group or LocalSystem. this is derived from 
 the need to invoke LoadUserProfile API which mention these requirements in 
 the specifications. This is in addition to the SE_TCB privilege mentioned in 
 YARN-1063, but this requirement will automatically imply that the SE_TCB 
 privilege is held by the nodemanager. For the Linux speakers in the audience, 
 the requirement is basically to run NM as root.
 h2. Dedicated high privilege Service
 Due to the high privilege required by the WCE we had discussed the need to 
 isolate the high privilege operations into a separate process, an 'executor' 
 service that is solely responsible to start the containers (incloding the 
 localizer). The NM would have to authenticate, authorize and communicate with 
 this service via an IPC mechanism and use this service to launch the 
 containers. I still believe we'll end up deploying such a service, but the 
 effort to onboard such a new platfrom specific new service on the project are 
 not trivial.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2458) Add file handling features to the Windows Secure Container Executor LRPC service

2014-09-15 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu resolved YARN-2458.

Resolution: Implemented

The patch for YARN-2458 is included in YARN-2198 going forward.

 Add file handling features to the Windows Secure Container Executor LRPC 
 service
 

 Key: YARN-2458
 URL: https://issues.apache.org/jira/browse/YARN-2458
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2458.1.patch, YARN-2458.2.patch


 In the WSCE design the nodemanager needs to do certain privileged operations 
 like change file ownership to arbitrary users or delete files owned by the 
 task container user after completion of the task. As we want to remove the 
 Administrator privilege  requirement from the nodemanager service, we have to 
 move these operations into the privileged LRPC helper service. 
 Extend the RPC interface to contain methods for change file ownership and 
 manipulate files, add JNI client side and implement the server side. This 
 will piggyback on the existing LRPC service so is not much infrastructure to 
 add (run as service, RPC init, authentictaion and authorization are already 
 solved). It just needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-15 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.delta.5.patch

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.separation.patch, 
 YARN-2198.trunk.4.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-15 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2198:
---
Attachment: YARN-2198.trunk.5.patch

Trunk.5 includes YARN-1972 's trunk.5 fix for LCE

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.separation.patch, 
 YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2514) The elevated WSCE LRPC should grant access to the jon to the namenode

2014-09-15 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu resolved YARN-2514.

Resolution: Implemented

Fix is contained in YARN-2198 4.patch and forward. The job is granted full 
control to the NM, LocalSystem and the container user.

 The elevated WSCE LRPC should grant access to the jon to the namenode
 -

 Key: YARN-2514
 URL: https://issues.apache.org/jira/browse/YARN-2514
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows

 the job created by wiutils task createAsUser must be 
 accessible/controllable/killable by namenode or winutils task list/kill will 
 fail later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133764#comment-14133764
 ] 

Hadoop QA commented on YARN-2198:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12668745/YARN-2198.trunk.5.patch
  against trunk revision fc741b5.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4958//console

This message is automatically generated.

 Remove the need to run NodeManager as privileged account for Windows Secure 
 Container Executor
 --

 Key: YARN-2198
 URL: https://issues.apache.org/jira/browse/YARN-2198
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, 
 YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.separation.patch, 
 YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch


 YARN-1972 introduces a Secure Windows Container Executor. However this 
 executor requires a the process launching the container to be LocalSystem or 
 a member of the a local Administrators group. Since the process in question 
 is the NodeManager, the requirement translates to the entire NM to run as a 
 privileged account, a very large surface area to review and protect.
 This proposal is to move the privileged operations into a dedicated NT 
 service. The NM can run as a low privilege account and communicate with the 
 privileged NT service when it needs to launch a container. This would reduce 
 the surface exposed to the high privileges. 
 There has to exist a secure, authenticated and authorized channel of 
 communication between the NM and the privileged NT service. Possible 
 alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
 be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
 specific inter-process communication channel that satisfies all requirements 
 and is easy to deploy. The privileged NT service would register and listen on 
 an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
 with libwinutils which would host the LPC client code. The client would 
 connect to the LPC port (NtConnectPort) and send a message requesting a 
 container launch (NtRequestWaitReplyPort). LPC provides authentication and 
 the privileged NT service can use authorization API (AuthZ) to validate the 
 caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor

2014-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133788#comment-14133788
 ] 

Hadoop QA commented on YARN-1972:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12668739/YARN-1972.trunk.5.patch
  against trunk revision fc741b5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4957//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4957//console

This message is automatically generated.

 Implement secure Windows Container Executor
 ---

 Key: YARN-1972
 URL: https://issues.apache.org/jira/browse/YARN-1972
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows
 Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, 
 YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, 
 YARN-1972.trunk.5.patch


 h1. Windows Secure Container Executor (WCE)
 YARN-1063 adds the necessary infrasturcture to launch a process as a domain 
 user as a solution for the problem of having a security boundary between 
 processes executed in YARN containers and the Hadoop services. The WCE is a 
 container executor that leverages the winutils capabilities introduced in 
 YARN-1063 and launches containers as an OS process running as the job 
 submitter user. A description of the S4U infrastructure used by YARN-1063 
 alternatives considered can be read on that JIRA.
 The WCE is based on the DefaultContainerExecutor. It relies on the DCE to 
 drive the flow of execution, but it overwrrides some emthods to the effect of:
 * change the DCE created user cache directories to be owned by the job user 
 and by the nodemanager group.
 * changes the actual container run command to use the 'createAsUser' command 
 of winutils task instead of 'create'
 * runs the localization as standalone process instead of an in-process Java 
 method call. This in turn relies on the winutil createAsUser feature to run 
 the localization as the job user.
  
 When compared to LinuxContainerExecutor (LCE), the WCE has some minor 
 differences:
 * it does no delegate the creation of the user cache directories to the 
 native implementation.
 * it does no require special handling to be able to delete user files
 The approach on the WCE came from a practical trial-and-error approach. I had 
 to iron out some issues around the Windows script shell limitations (command 
 line length) to get it to work, the biggest issue being the huge CLASSPATH 
 that is commonplace in Hadoop environment container executions. The job 
 container itself is already dealing with this via a so called 'classpath 
 jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch 
 as a separate container the same issue had to be resolved and I used the same 
 'classpath jar' approach.
 h2. Deployment Requirements
 To use the WCE one needs to set the 
 `yarn.nodemanager.container-executor.class` to 
 `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` 
 and set the `yarn.nodemanager.windows-secure-container-executor.group` to a 
 Windows security group name that is the nodemanager service principal is a 
 member of (equivalent of LCE 
 `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE 
 does not require any configuration outside of the Hadoop own's yar-site.xml.
 For WCE to work the nodemanager must run as a service principal that is 
 member of the local Administrators group or LocalSystem. this is derived from 
 the need to invoke LoadUserProfile API which mention these requirements in 

[jira] [Commented] (YARN-2546) REST API for application creation/submission is using strings for numeric boolean values

2014-09-15 Thread Doug Haigh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133812#comment-14133812
 ] 

Doug Haigh commented on YARN-2546:
--

The definition of the fields does not match the JSON returned. This is a 
problem for non-Java parsers. Not an insurmountable problem, but a problem.

 REST API for application creation/submission is using strings for numeric  
 boolean values
 --

 Key: YARN-2546
 URL: https://issues.apache.org/jira/browse/YARN-2546
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.5.1
Reporter: Doug Haigh

 When YARN responds with or accepts JSON, numbers  booleans are being 
 represented as strings which can cause parsing problems.
 Resource values look like 
 {
   application-id:application_1404198295326_0001,
   maximum-resource-capability:
{
   memory:8192,
   vCores:32
}
 }
 Instead of
 {
   application-id:application_1404198295326_0001,
   maximum-resource-capability:
{
   memory:8192,
   vCores:32
}
 }
 When I POST to start a job, numeric values are represented as numbers:
   local-resources:
   {
 entry:
 [
   {
 key:AppMaster.jar,
 value:
 {
   
 resource:hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar,
   type:FILE,
   visibility:APPLICATION,
   size: 43004,
   timestamp: 1405452071209
 }
   }
 ]
   },
 Instead of
   local-resources:
   {
 entry:
 [
   {
 key:AppMaster.jar,
 value:
 {
   
 resource:hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar,
   type:FILE,
   visibility:APPLICATION,
   size: 43004,
   timestamp: 1405452071209
 }
   }
 ]
   },
 Similarly, Boolean values are also represented as strings:
 keep-containers-across-application-attempts:false
 Instead of 
 keep-containers-across-application-attempts:false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2551) Windows Secure Cotnainer Executor: Add checks to validate that the wsce-site.xml is write restricted to Administrators only

2014-09-15 Thread Remus Rusanu (JIRA)
Remus Rusanu created YARN-2551:
--

 Summary: Windows Secure Cotnainer Executor: Add checks to validate 
that the wsce-site.xml is write restricted to Administrators only
 Key: YARN-2551
 URL: https://issues.apache.org/jira/browse/YARN-2551
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu


The wsce-site.xml containes the impersonate.allowed and impersonate.denied keys 
that restrict/control the users that can be impersonated by the WSCE 
containers. The impersonation frameworks in winutils should validate that only 
Administrators have write control on this file. 

This is similar to how LCE is validating that only root has write permissions 
on container-executor.cfg file on secure Linux clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2552) Windows Secure Container Executor: the privileged file operations of hadoopwinutilsvc should be constrained to localdirs only

2014-09-15 Thread Remus Rusanu (JIRA)
Remus Rusanu created YARN-2552:
--

 Summary: Windows Secure Container Executor: the privileged file 
operations of hadoopwinutilsvc should be constrained to localdirs only
 Key: YARN-2552
 URL: https://issues.apache.org/jira/browse/YARN-2552
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu


YARN-2458 added file manipulation operations executed in an elevated context by 
hadoopwinutilsvc. W/o any constraint, the NM (or a hijacker that takes over the 
NM) can manipulate arbitrary OS files under highest possible privileges, an 
easy elevation attack vector. The service should only allow operations on 
files/directories that are under the configured NM localdirs. It should read 
this value from wsce-site.xml, as the yarn-site.xml cannot be trusted, being 
writable by Hadoop admins (YARN-2551 ensures wsce-site.xml is only writable by 
system Administrators, not Hadoop admins).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2485) Fix WSCE folder/file/classpathJar permission/order when running as non-admin

2014-09-15 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu resolved YARN-2485.

Resolution: Duplicate

This is fixed by YARN-2458 implementation of a en 'elevated' file system for 
WSCE.

 Fix WSCE folder/file/classpathJar permission/order when running as non-admin
 

 Key: YARN-2485
 URL: https://issues.apache.org/jira/browse/YARN-2485
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
  Labels: security, windows

 The WSCE creates the local, usercache, filecache appcache dirs in the normal 
 DefaultContainerExecutor way, and then assigns ownership to the userprocess. 
 The WSCE configured group is added, but the permission masks used (710) do no 
 give write permissions on the appcache/filecache/usercache folder to the NM 
 itself.
 The creation of these folders, as well as the creation of the temporary 
 classPath jar files must succeed even after thes file/dir ownership is 
 relinquished to the task user and the NM does not run as a local 
 Administrator. 
 LCE handles all these dirs inside the container-executor app (root). The 
 classpathJar issue does not exists on Linux.
 The dirs can be handled by simply delaying the transfer (create all dirs and 
 temp files, then assign ownership in bulk) but the task classpathJar is 
 'special' and needs some refactoring of the NM launch sequence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2553) Windows Secure Container Executor: assign PROCESS_TERMINATE privilege to NM on created containers

2014-09-15 Thread Remus Rusanu (JIRA)
Remus Rusanu created YARN-2553:
--

 Summary: Windows Secure Container Executor: assign 
PROCESS_TERMINATE privilege to NM on created containers
 Key: YARN-2553
 URL: https://issues.apache.org/jira/browse/YARN-2553
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu


In order to open a job handle with JOB_OBJECT_TERMINATE access, the caller must 
have PROCESS_TERMINATE access on the handle of each process in the job (MSDN 
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686709(v=vs.85).aspx) 
.

hadoopwinutilsvc process should explicitly grant PROCESS_TERMINATE access to NM 
account on the newly started container process. I hope this gets inherited...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2474) document the wsce-site.xml keys in hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm

2014-09-15 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated YARN-2474:
---
Attachment: YARN-2474.1.patch

 document the wsce-site.xml keys in 
 hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
 ---

 Key: YARN-2474
 URL: https://issues.apache.org/jira/browse/YARN-2474
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Critical
  Labels: security, windows
 Attachments: YARN-2474.1.patch


 document the keys used to configure WSCE 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2436) yarn application help doesn't work

2014-09-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2436:
---
Release Note: test

 yarn application help doesn't work
 --

 Key: YARN-2436
 URL: https://issues.apache.org/jira/browse/YARN-2436
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
  Labels: newbie
 Fix For: 3.0.0

 Attachments: YARN-2436.patch


 The previous version of the yarn command plays games with the command stack 
 for some commands.  The new code needs duplicate this wackiness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Deleted] (YARN-2535) Test JIRA, ignore.

2014-09-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer deleted YARN-2535:
---


 Test JIRA, ignore.
 --

 Key: YARN-2535
 URL: https://issues.apache.org/jira/browse/YARN-2535
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2537) relnotes.py prints description instead of release note for YARN issues

2014-09-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-2537.

Resolution: Fixed

Fixed by INFRA-8338.

Manual test shows that relnotes.py is working properly for YARN now.

 relnotes.py prints description instead of release note for YARN issues
 --

 Key: YARN-2537
 URL: https://issues.apache.org/jira/browse/YARN-2537
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer

 Currently, the release notes for YARN always print the description JIRA field 
 instead of the release note.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2537) relnotes.py prints description instead of release note for YARN issues

2014-09-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reassigned YARN-2537:
--

Assignee: Allen Wittenauer

 relnotes.py prints description instead of release note for YARN issues
 --

 Key: YARN-2537
 URL: https://issues.apache.org/jira/browse/YARN-2537
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer

 Currently, the release notes for YARN always print the description JIRA field 
 instead of the release note.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is enabled as the HTTP policy

2014-09-15 Thread Jonathan Maron (JIRA)
Jonathan Maron created YARN-2554:


 Summary: Slider AM Web UI is inaccessible if HTTPS/SSL is enabled 
as the HTTP policy
 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron


If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized 
with SSL listeners.  The RM has a web app proxy servlet that acts as a proxy 
for incoming AM requests.  In order to forward the requests to the AM the proxy 
servlet makes use of HttpClient.  However, the HttpClient utilized is not 
initialized correctly with the necessary certs to allow for successful one way 
SSL invocations to the other nodes in the cluster (it is not configured to 
access/load the client truststore specified in ssl-client.xml).   I imagine 
SSLFactory.createSSLSocketFactory() could be utilized to create an instance 
that can be assigned to the HttpClient.

The symptoms of this issue are:

AM: Displays unknown_certificate exception
RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
sun.security.validator.ValidatorException: PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy

2014-09-15 Thread Jonathan Maron (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Maron updated YARN-2554:
-
Summary: Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the 
HTTP policy  (was: Slider AM Web UI is inaccessible if HTTPS/SSL is enabled as 
the HTTP policy)

 Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
 -

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron

 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is enabled as the HTTP policy

2014-09-15 Thread Jonathan Maron (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133934#comment-14133934
 ] 

Jonathan Maron commented on YARN-2554:
--

A workaround (though not necessarily a production recommended one) is to add 
the client trust store certs to the the JDK's cacerts file (export the trust 
store certs, import them to JDK/jre/lib/security/cacerts)

 Slider AM Web UI is inaccessible if HTTPS/SSL is enabled as the HTTP policy
 ---

 Key: YARN-2554
 URL: https://issues.apache.org/jira/browse/YARN-2554
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.6.0
Reporter: Jonathan Maron

 If the HTTP policy to enable HTTPS is specified, the RM and AM are 
 initialized with SSL listeners.  The RM has a web app proxy servlet that acts 
 as a proxy for incoming AM requests.  In order to forward the requests to the 
 AM the proxy servlet makes use of HttpClient.  However, the HttpClient 
 utilized is not initialized correctly with the necessary certs to allow for 
 successful one way SSL invocations to the other nodes in the cluster (it is 
 not configured to access/load the client truststore specified in 
 ssl-client.xml).   I imagine SSLFactory.createSSLSocketFactory() could be 
 utilized to create an instance that can be assigned to the HttpClient.
 The symptoms of this issue are:
 AM: Displays unknown_certificate exception
 RM:  Displays an exception such as javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2531) CGroups - Admins should be allowed to enforce strict cpu limits

2014-09-15 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134020#comment-14134020
 ] 

Varun Vasudev commented on YARN-2531:
-

Similar but not the same. YARN-810 allows apps to choose to limit themselves. 
This allows admins to enforce limits irrespective of the app.

 CGroups - Admins should be allowed to enforce strict cpu limits
 ---

 Key: YARN-2531
 URL: https://issues.apache.org/jira/browse/YARN-2531
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2531.0.patch


 From YARN-2440 -
 {quote} 
 The other dimension to this is determinism w.r.t performance. Limiting to 
 allocated cores overall (as well as per container later) helps orgs run 
 workloads and reason about them deterministically. One of the examples is 
 benchmarking apps, but deterministic execution is a desired option beyond 
 benchmarks too.
 {quote}
 It would be nice to have an option to let admins to enforce strict cpu limits 
 for apps for things like benchmarking, etc. By default this flag should be 
 off so that containers can use available cpu but admin can turn the flag on 
 to determine worst case performance, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-09-15 Thread Maysam Yabandeh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134024#comment-14134024
 ] 

Maysam Yabandeh commented on YARN-1530:
---

bq. YARN apps already depend on ZK/RM/HDFS being up. Every new service 
dependency we add will only increase the chances of YARN apps failing or 
slowing down. That's true even if the ATS service's uptime is as good as ZK or 
RM.
bq. Realistically, getting the ATS service's uptime to the same level as ZK or 
HDFS is a long and winding road. Especially when most discussions here assume 
HBase as the backing store. HBase's uptime is lower than HDFS/ZK/RM because 
it's more complex to operate. If HBase going down means ATS service going down, 
then we certainly should guard against this failure scenario.

+1

bq. And if we have a choice to decouple the write path from the ATS service, 
why not?
bq. If we have an alternate code path to persist events first before they hit 
the final backing store, why not do that all the time?

I would call that a reasonable approach. One alternative also is to use HDFS as 
the backup plan, i.e., use it when HBase is down. Anyway, having ATS pluggable 
I guess all approaches can grow independently.


 [Umbrella] Store, manage and serve per-framework application-timeline data
 --

 Key: YARN-1530
 URL: https://issues.apache.org/jira/browse/YARN-1530
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
 ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
 application timeline design-20140116.pdf, application timeline 
 design-20140130.pdf, application timeline design-20140210.pdf


 This is a sibling JIRA for YARN-321.
 Today, each application/framework has to do store, and serve per-framework 
 data all by itself as YARN doesn't have a common solution. This JIRA attempts 
 to solve the storage, management and serving of per-framework data from 
 various applications, both running and finished. The aim is to change YARN to 
 collect and store data in a generic manner with plugin points for frameworks 
 to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2468) Log handling for LRS

2014-09-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2468:

Attachment: YARN-2468.3.rebase.patch

create the patch based on the latest trunk

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2529) Generic history service RPC interface doesn't work when service authorization is enabled

2014-09-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134178#comment-14134178
 ] 

Jian He commented on YARN-2529:
---

+1 

 Generic history service RPC interface doesn't work when service authorization 
 is enabled
 

 Key: YARN-2529
 URL: https://issues.apache.org/jira/browse/YARN-2529
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2529.1.patch, YARN-2529.2.patch


 Here's the problem shown in the log:
 {code}
 14/09/10 10:42:44 INFO ipc.Server: Connection from 10.22.2.109:55439 for 
 protocol org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB is 
 unauthorized for user zshen (auth:SIMPLE)
 14/09/10 10:42:44 INFO ipc.Server: Socket Reader #1 for port 10200: 
 readAndProcess from client 10.22.2.109 threw exception 
 [org.apache.hadoop.security.authorize.AuthorizationException: Protocol 
 interface org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB is not 
 known.]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2516) Deprecate yarn.policy.file

2014-09-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-2516.

Resolution: Duplicate

Looks like HADOOP-9902 wiped out most of yarn.policy.file already, so the only 
remaining bit is in the yarn-env.sh.  That is easier to cleanup as aprt of 
YARN-2438.

 Deprecate yarn.policy.file
 --

 Key: YARN-2516
 URL: https://issues.apache.org/jira/browse/YARN-2516
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scripts
Reporter: Allen Wittenauer
  Labels: newbie

 It doesn't appear that yarn.policy.file is actually used anywhere, there 
 isn't an example yarn-policy.xml file, etc, etc.  So let's remove it from the 
 shell code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI

2014-09-15 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2540:
-
Attachment: YARN-2540-v2.txt

There is a case in which the filter can return wrong results :
Say apps running on root.a.b and root.a.b1
Clicking on root.a.b would return apps running in both b and b1, instead of 
only b.

v2 patch corrects this.

 Fair Scheduler : queue filters not working on scheduler page in RM UI
 -

 Key: YARN-2540
 URL: https://issues.apache.org/jira/browse/YARN-2540
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0, 2.5.1
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt


 Steps to reproduce :
 1. Run an app in default queue.
 2. While the app is running, go to the scheduler page on RM UI.
 3. You would see the app in the apptable at the bottom.
 4. Now click on default queue to filter the apptable on root.default.
 5. App disappears from apptable although it is running on default queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2438) yarn-env.sh cleanup

2014-09-15 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134258#comment-14134258
 ] 

Allen Wittenauer commented on YARN-2438:


Ofc, HADOOP-10950 would make heap management much more obvious.

 yarn-env.sh cleanup
 ---

 Key: YARN-2438
 URL: https://issues.apache.org/jira/browse/YARN-2438
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
  Labels: newbie

 a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented 
 b) Defaults should get moved to yarn-config.sh instead of being specifically 
 set



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2468) Log handling for LRS

2014-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134257#comment-14134257
 ] 

Hadoop QA commented on YARN-2468:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12668794/YARN-2468.3.rebase.patch
  against trunk revision 24d920b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4959//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4959//console

This message is automatically generated.

 Log handling for LRS
 

 Key: YARN-2468
 URL: https://issues.apache.org/jira/browse/YARN-2468
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
 YARN-2468.3.rebase.patch


 Currently, when application is finished, NM will start to do the log 
 aggregation. But for Long running service applications, this is not ideal. 
 The problems we have are:
 1) LRS applications are expected to run for a long time (weeks, months).
 2) Currently, all the container logs (from one NM) will be written into a 
 single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2438) yarn-env.sh cleanup

2014-09-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2438:
---
Description: 
a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented 

b) Defaults should get moved to yarn-config.sh instead of being specifically set

c) Remove references to things that are covered elsewhere, deprecated, etc.

  was:
a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented 

b) Defaults should get moved to yarn-config.sh instead of being specifically set



 yarn-env.sh cleanup
 ---

 Key: YARN-2438
 URL: https://issues.apache.org/jira/browse/YARN-2438
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
  Labels: newbie

 a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented 
 b) Defaults should get moved to yarn-config.sh instead of being specifically 
 set
 c) Remove references to things that are covered elsewhere, deprecated, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2438) yarn-env.sh cleanup

2014-09-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2438:
---
Attachment: YARN-2438.patch

 yarn-env.sh cleanup
 ---

 Key: YARN-2438
 URL: https://issues.apache.org/jira/browse/YARN-2438
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
  Labels: newbie
 Attachments: YARN-2438.patch


 a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented 
 b) Defaults should get moved to yarn-config.sh instead of being specifically 
 set
 c) Remove references to things that are covered elsewhere, deprecated, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2438) yarn-env.sh cleanup

2014-09-15 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134295#comment-14134295
 ] 

Allen Wittenauer commented on YARN-2438:


If HADOOP-10950 goes in first, this patch needs to updated for it.

 yarn-env.sh cleanup
 ---

 Key: YARN-2438
 URL: https://issues.apache.org/jira/browse/YARN-2438
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
  Labels: newbie
 Attachments: YARN-2438.patch


 a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented 
 b) Defaults should get moved to yarn-config.sh instead of being specifically 
 set
 c) Remove references to things that are covered elsewhere, deprecated, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2549) TestContainerLaunch fails due to classpath problem with hamcrest classes.

2014-09-15 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134303#comment-14134303
 ] 

Arpit Agarwal commented on YARN-2549:
-

+1 for the patch.

 TestContainerLaunch fails due to classpath problem with hamcrest classes.
 -

 Key: YARN-2549
 URL: https://issues.apache.org/jira/browse/YARN-2549
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager, test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: YARN-2549.1.patch


 The mockito jar bundles its own copy of the hamcrest classes, and it's ahead 
 of our hamcrest dependency jar on the test classpath for 
 hadoop-yarn-server-nodemanager.  Unfortunately, the version bundled in 
 mockito doesn't match the version we need, so it's missing the 
 {{CoreMatchers#containsString}} method.  This causes the tests to fail with 
 {{NoSuchMethodError}} on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2437) start-yarn.sh/stop-yarn needs to give info

2014-09-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2437:
---
Issue Type: Improvement  (was: Bug)

 start-yarn.sh/stop-yarn needs to give info
 --

 Key: YARN-2437
 URL: https://issues.apache.org/jira/browse/YARN-2437
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scripts
Reporter: Allen Wittenauer
Assignee: Hao Gao
  Labels: newbie

 With the merger and cleanup of the daemon launch code, yarn-daemons.sh no 
 longer prints Starting information.  This should be made more of an analog 
 of start-dfs.sh/stop-dfs.sh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2437) start-yarn.sh/stop-yarn should give info

2014-09-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2437:
---
Summary: start-yarn.sh/stop-yarn should give info  (was: 
start-yarn.sh/stop-yarn needs to give info)

 start-yarn.sh/stop-yarn should give info
 

 Key: YARN-2437
 URL: https://issues.apache.org/jira/browse/YARN-2437
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scripts
Reporter: Allen Wittenauer
Assignee: Hao Gao
  Labels: newbie

 With the merger and cleanup of the daemon launch code, yarn-daemons.sh no 
 longer prints Starting information.  This should be made more of an analog 
 of start-dfs.sh/stop-dfs.sh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2438) yarn-env.sh cleanup

2014-09-15 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2438:
---
Issue Type: Improvement  (was: Bug)

 yarn-env.sh cleanup
 ---

 Key: YARN-2438
 URL: https://issues.apache.org/jira/browse/YARN-2438
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
  Labels: newbie
 Attachments: YARN-2438.patch


 a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented 
 b) Defaults should get moved to yarn-config.sh instead of being specifically 
 set
 c) Remove references to things that are covered elsewhere, deprecated, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2102) More generalized timeline ACLs

2014-09-15 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134325#comment-14134325
 ] 

Li Lu commented on YARN-2102:
-

Hi [~zjshen], just a quick thing to check, shall we use the lock map in the 
existing leveldbstore here? Seems like some operations need to acquire locks? 

 More generalized timeline ACLs
 --

 Key: YARN-2102
 URL: https://issues.apache.org/jira/browse/YARN-2102
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch, 
 YARN-2102.2.patch, YARN-2102.3.patch, YARN-2102.5.patch


 We need to differentiate the access controls of reading and writing 
 operations, and we need to think about cross-entity access control. For 
 example, if we are executing a workflow of MR jobs, which writing the 
 timeline data of this workflow, we don't want other user to pollute the 
 timeline data of the workflow by putting something under it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2549) TestContainerLaunch fails due to classpath problem with hamcrest classes.

2014-09-15 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-2549:

Hadoop Flags: Reviewed

Thank you, Arpit.  I committed this to trunk and branch-2.

 TestContainerLaunch fails due to classpath problem with hamcrest classes.
 -

 Key: YARN-2549
 URL: https://issues.apache.org/jira/browse/YARN-2549
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager, test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: YARN-2549.1.patch


 The mockito jar bundles its own copy of the hamcrest classes, and it's ahead 
 of our hamcrest dependency jar on the test classpath for 
 hadoop-yarn-server-nodemanager.  Unfortunately, the version bundled in 
 mockito doesn't match the version we need, so it's missing the 
 {{CoreMatchers#containsString}} method.  This causes the tests to fail with 
 {{NoSuchMethodError}} on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2549) TestContainerLaunch fails due to classpath problem with hamcrest classes.

2014-09-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2549:
--
Fix Version/s: 2.6.0

 TestContainerLaunch fails due to classpath problem with hamcrest classes.
 -

 Key: YARN-2549
 URL: https://issues.apache.org/jira/browse/YARN-2549
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager, test
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 2.6.0

 Attachments: YARN-2549.1.patch


 The mockito jar bundles its own copy of the hamcrest classes, and it's ahead 
 of our hamcrest dependency jar on the test classpath for 
 hadoop-yarn-server-nodemanager.  Unfortunately, the version bundled in 
 mockito doesn't match the version we need, so it's missing the 
 {{CoreMatchers#containsString}} method.  This causes the tests to fail with 
 {{NoSuchMethodError}} on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2438) yarn-env.sh cleanup

2014-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134349#comment-14134349
 ] 

Hadoop QA commented on YARN-2438:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668821/YARN-2438.patch
  against trunk revision 43b0303.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4961//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4961//console

This message is automatically generated.

 yarn-env.sh cleanup
 ---

 Key: YARN-2438
 URL: https://issues.apache.org/jira/browse/YARN-2438
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
  Labels: newbie
 Attachments: YARN-2438.patch


 a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented 
 b) Defaults should get moved to yarn-config.sh instead of being specifically 
 set
 c) Remove references to things that are covered elsewhere, deprecated, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI

2014-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134367#comment-14134367
 ] 

Hadoop QA commented on YARN-2540:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668815/YARN-2540-v2.txt
  against trunk revision 43b0303.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4960//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4960//console

This message is automatically generated.

 Fair Scheduler : queue filters not working on scheduler page in RM UI
 -

 Key: YARN-2540
 URL: https://issues.apache.org/jira/browse/YARN-2540
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0, 2.5.1
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt


 Steps to reproduce :
 1. Run an app in default queue.
 2. While the app is running, go to the scheduler page on RM UI.
 3. You would see the app in the apptable at the bottom.
 4. Now click on default queue to filter the apptable on root.default.
 5. App disappears from apptable although it is running on default queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2555) Effective max-allocation-* should consider biggest node

2014-09-15 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2555:
--

 Summary: Effective max-allocation-* should consider biggest node
 Key: YARN-2555
 URL: https://issues.apache.org/jira/browse/YARN-2555
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla


The effective max-allocation-mb should be 
min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject 
container requests for resources larger than any node. Today, these requests 
wait forever. 

We should do this for all resources and update the effective value on node 
updates. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2555) Effective max-allocation-* should consider biggest node

2014-09-15 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan reassigned YARN-2555:
-

Assignee: Wei Yan

 Effective max-allocation-* should consider biggest node
 ---

 Key: YARN-2555
 URL: https://issues.apache.org/jira/browse/YARN-2555
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Wei Yan

 The effective max-allocation-mb should be 
 min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject 
 container requests for resources larger than any node. Today, these requests 
 wait forever. 
 We should do this for all resources and update the effective value on node 
 updates. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2556) Tool to measure the performance of the timeline server

2014-09-15 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-2556:
-

 Summary: Tool to measure the performance of the timeline server
 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles


We need to be able to understand the capacity model for the timeline server to 
give users the tools they need to deploy a timeline server with the correct 
capacity.

I propose we create a mapreduce job that can measure timeline server write and 
read performance. Transactions per second, I/O for both read and write would be 
a good start.

This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1779) Handle AMRMTokens across RM failover

2014-09-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-1779:
-

Assignee: Jian He

 Handle AMRMTokens across RM failover
 

 Key: YARN-1779
 URL: https://issues.apache.org/jira/browse/YARN-1779
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Jian He
Priority: Blocker
  Labels: ha

 Verify if AMRMTokens continue to work against RM failover. If not, we will 
 have to do something along the lines of YARN-986. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-218) Distiguish between failed and killed app attempts

2014-09-15 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White resolved YARN-218.

Resolution: Duplicate

Fixed in YARN-614.

 Distiguish between failed and killed app attempts
 -

 Key: YARN-218
 URL: https://issues.apache.org/jira/browse/YARN-218
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Tom White
Assignee: Tom White

 A failed app attempt is one that failed due to an error in the user 
 program, as opposed to one that was killed by the system. Like in MapReduce 
 task attempts, we should distinguish the two so that killed attempts do not 
 count against the number of retries (yarn.resourcemanager.am.max-retries).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2014-09-15 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134542#comment-14134542
 ] 

Jonathan Eagles commented on YARN-2556:
---

This jira is to give users realistic performance numbers running the timeline 
server for their setup specific to their own hardware. (HBase, leveldb, etc)

FYI. LevelDB publishes their own performance statics.
https://code.google.com/p/leveldb/

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles

 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated

2014-09-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2312:
--
Target Version/s: 2.6.0

We SHOULD try to get this in 2.6, marking so..

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA

 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-09-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134598#comment-14134598
 ] 

Vinod Kumar Vavilapalli commented on YARN-2080:
---

ClientRMService: checkReservationSytem() logs for every new reservation 
requests if reservations are not enabled. That is too much logging.

AbstractReservationSystem: Actual start of the thread pool shouldn't be in 
serviceInit()

Missed these
 - reservation.plan.follower - reservation-system.plan-follower
 - reservation.planfollower.time-step - 
reservation-system.plan-follower.time-step

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
 YARN-2080.patch, YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-09-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134693#comment-14134693
 ] 

Carlo Curino commented on YARN-1711:


[~chris.douglas] thanks for the prompt and precise feedback. 

I addressed them in the updated patch (v4) as follows:
 * added comments and implemented nits as requested
 * got rid of excludeList altogether, as this is not necessary anymore given 
restructuring done in th e ReservationSystem (i.e., the exclusion list is now 
handled outside the scope of the policies which is cleaner/simpler)
 * improved tests, by using subclasses of exception wherever possible, and 
scoping them so that it is more telling that a failure is what we wanted to 
be.
 * subclasses of PlanningException give callers some indication of why the call 
did not succeed (e.g., enough for the tests), future smarter agents might 
require more detailed explanation (e.g., the JSON payload you mention)
 * used the (expected = SomeException.class) notation for tests. 

Regarding annotations I am using @Public @Unstable for the exceptions, as they 
can bubble up all the way to users, while @LimitedPrivate(yarn) @Unstable for 
the other classes. If anyone has better suggestions for the annotations please 
advise.


 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, 
 YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-09-15 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-1711:
---
Attachment: YARN-1711.4.patch

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, 
 YARN-1711.4.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134714#comment-14134714
 ] 

Hadoop QA commented on YARN-1711:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668909/YARN-1711.4.patch
  against trunk revision 8008f0e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4962//console

This message is automatically generated.

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, 
 YARN-1711.4.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user

2014-09-15 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134715#comment-14134715
 ] 

Li Lu commented on YARN-2446:
-

Hi [~zjshen], I applied the patch on top of a YARN-2102 branch, and ran all 
system tests changed in your patch. They all passed. On the code side, in 
generally it looks good to me. Here are some comments:

{code}
// create a default namespace, which allows everybody to access and
// modify the entities in it.
namespace = new TimelineNamespace();
namespace.setId(DEFAULT_NAMESPACE_ID);
namespace.setDescription(System Default Namespace);
namespace.setOwner(
UserGroupInformation.getCurrentUser().getShortUserName());
namespace.setReaders(*);
namespace.setWriters(*);
{code}

I would like to confirm that it is fine to set the owner of default namespace 
to current user. Since this is a lazy initialization, the owner of the default 
namespace is not deterministic. Will this cause any troubles in future?

{code}
throw new YarnException(The namespace of the timeline entity 
+ entityID +  is not allowed to be changed.);
{code}

Could you please verify if this exception only represent the case when the user 
tries to change the namespace of the entity? Is it possible to have a 
scenario where the user is not changing the namespace, but just set it wrong? 
If this scenario is possible, maybe we want to change the exception message 
since it may be a little bit confusing. 

{code}
public void invalidate(TimelineNamespace namespace) {
  if (aclExts.containsKey(namespace.getId())) {
putNamespaceIntoCache(namespace);
  }
}
{code}

When this function is called, it would be reasonable for the user to expect the 
cached item is invalidated in the cache. However, here we're actually updating 
it. Maybe we want to change the name of this function? 

Thanks! 

 Using TimelineNamespace to shield the entities of a user
 

 Key: YARN-2446
 URL: https://issues.apache.org/jira/browse/YARN-2446
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2446.1.patch


 Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the 
 entities, preventing them from being accessed or affected by other users' 
 operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-09-15 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134718#comment-14134718
 ] 

Chris Douglas commented on YARN-1711:
-

+1 Thanks for addressing the feedback on the patch

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, 
 YARN-1711.4.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI

2014-09-15 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2540:
-
Attachment: YARN-2540-v3.txt

Updated the patch to take care of same prefix problem at parent queue.


 Fair Scheduler : queue filters not working on scheduler page in RM UI
 -

 Key: YARN-2540
 URL: https://issues.apache.org/jira/browse/YARN-2540
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0, 2.5.1
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt, YARN-2540-v3.txt


 Steps to reproduce :
 1. Run an app in default queue.
 2. While the app is running, go to the scheduler page on RM UI.
 3. You would see the app in the apptable at the bottom.
 4. Now click on default queue to filter the apptable on root.default.
 5. App disappears from apptable although it is running on default queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-09-15 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134762#comment-14134762
 ] 

Anubhav Dhoot commented on YARN-1372:
-

A finishedContainer that was sent to previous AM will have to be sent again to 
the new AM inorder to get the ack. So we need to transfer the 
finishedContainersSentToAM from previous attempt to the justFinishedContainers 
of the new attempt (if we decide to transfer those). Then why not also transfer 
the justFinishedContainers as well? If we are going to not consider whether 
work-preserving AM restart is enabled for this, we should be consistent whether 
we transfer justFinishedContainers and finishedContainersSentToAM (either both 
or none). Agree?  

 Ensure all completed containers are reported to the AMs across RM restart
 -

 Key: YARN-1372
 URL: https://issues.apache.org/jira/browse/YARN-1372
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1372.001.patch, YARN-1372.001.patch, 
 YARN-1372.002_NMHandlesCompletedApp.patch, 
 YARN-1372.002_RMHandlesCompletedApp.patch, 
 YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, 
 YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, 
 YARN-1372.prelim.patch, YARN-1372.prelim2.patch


 Currently the NM informs the RM about completed containers and then removes 
 those containers from the RM notification list. The RM passes on that 
 completed container information to the AM and the AM pulls this data. If the 
 RM dies before the AM pulls this data then the AM may not be able to get this 
 information again. To fix this, NM should maintain a separate list of such 
 completed container notifications sent to the RM. After the AM has pulled the 
 containers from the RM then the RM will inform the NM about it and the NM can 
 remove the completed container from the new list. Upon re-register with the 
 RM (after RM restart) the NM should send the entire list of completed 
 containers to the RM along with any other containers that completed while the 
 RM was dead. This ensures that the RM can inform the AM's about all completed 
 containers. Some container completions may be reported more than once since 
 the AM may have pulled the container but the RM may die before notifying the 
 NM about the pull.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI

2014-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134791#comment-14134791
 ] 

Hadoop QA commented on YARN-2540:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668912/YARN-2540-v3.txt
  against trunk revision 0ac760a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4963//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4963//console

This message is automatically generated.

 Fair Scheduler : queue filters not working on scheduler page in RM UI
 -

 Key: YARN-2540
 URL: https://issues.apache.org/jira/browse/YARN-2540
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.5.0, 2.5.1
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt, YARN-2540-v3.txt


 Steps to reproduce :
 1. Run an app in default queue.
 2. While the app is running, go to the scheduler page on RM UI.
 3. You would see the app in the apptable at the bottom.
 4. Now click on default queue to filter the apptable on root.default.
 5. App disappears from apptable although it is running on default queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2102) More generalized timeline ACLs

2014-09-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2102:
--
Attachment: YARN-2102.6.patch

 More generalized timeline ACLs
 --

 Key: YARN-2102
 URL: https://issues.apache.org/jira/browse/YARN-2102
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch, 
 YARN-2102.2.patch, YARN-2102.3.patch, YARN-2102.5.patch, YARN-2102.6.patch


 We need to differentiate the access controls of reading and writing 
 operations, and we need to think about cross-entity access control. For 
 example, if we are executing a workflow of MR jobs, which writing the 
 timeline data of this workflow, we don't want other user to pollute the 
 timeline data of the workflow by putting something under it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2102) More generalized timeline ACLs

2014-09-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134820#comment-14134820
 ] 

Zhijie Shen commented on YARN-2102:
---

bq. From your design doc, I think what we're proposing here is something to 
partition the domain of entities, but not enhancing identifications. Maybe we 
want to consider an alternative name like domain or partition here?

Talked to [~gtCarrera] offline. It seems to be a good suggestion, and we don't 
plan of nested namespaces actually. Change it to domain in the new patch.

bq. This is significantly different to any other fields. Are there any specific 
considerations behind this?

I follow the way that we put the start time and the insert time of an entity. 
It doesn't make much difference to split them and associate them with different 
keys, and they are usually retrieved together. I added more code comments to 
describe it.

bq. I think this is left out for some reasons, and maybe in YARN-2446 you're 
addressing this?

Yes, the use of the domain acls is in YARN-2446

bq. Shall we add a default branch here to track any potential problems? 

Added an else block here.

bq. just a quick thing to check, shall we use the lock map in the existing 
leveldbstore here? Seems like some operations need to acquire locks?

According to the offline discussion, the currently lock have some obvious 
issues. Let's fix it in a separate Jira.

 More generalized timeline ACLs
 --

 Key: YARN-2102
 URL: https://issues.apache.org/jira/browse/YARN-2102
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch, 
 YARN-2102.2.patch, YARN-2102.3.patch, YARN-2102.5.patch, YARN-2102.6.patch


 We need to differentiate the access controls of reading and writing 
 operations, and we need to think about cross-entity access control. For 
 example, if we are executing a workflow of MR jobs, which writing the 
 timeline data of this workflow, we don't want other user to pollute the 
 timeline data of the workflow by putting something under it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2102) More generalized timeline ACLs

2014-09-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134843#comment-14134843
 ] 

Hadoop QA commented on YARN-2102:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668939/YARN-2102.6.patch
  against trunk revision 932ae03.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4964//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4964//console

This message is automatically generated.

 More generalized timeline ACLs
 --

 Key: YARN-2102
 URL: https://issues.apache.org/jira/browse/YARN-2102
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch, 
 YARN-2102.2.patch, YARN-2102.3.patch, YARN-2102.5.patch, YARN-2102.6.patch


 We need to differentiate the access controls of reading and writing 
 operations, and we need to think about cross-entity access control. For 
 example, if we are executing a workflow of MR jobs, which writing the 
 timeline data of this workflow, we don't want other user to pollute the 
 timeline data of the workflow by putting something under it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2555) Effective max-allocation-* should consider biggest node

2014-09-15 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134854#comment-14134854
 ] 

Varun Vasudev commented on YARN-2555:
-

Duplicate of YARN-2422?

 Effective max-allocation-* should consider biggest node
 ---

 Key: YARN-2555
 URL: https://issues.apache.org/jira/browse/YARN-2555
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Wei Yan

 The effective max-allocation-mb should be 
 min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject 
 container requests for resources larger than any node. Today, these requests 
 wait forever. 
 We should do this for all resources and update the effective value on node 
 updates. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2555) Effective max-allocation-* should consider biggest node

2014-09-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134868#comment-14134868
 ] 

Wangda Tan commented on YARN-2555:
--

I think they're different, proposal of YARN-2422 is making a flexible 
max-allocation. And this JIRA focus on reject ResourceRequest when its larger 
than biggest node in the cluster.

IMHO, we don't need to both of them, it is very possible nodes connect to RM 
after application submitted, especially in virtual cluster environment.
And as [~sandyr] commented, it's weird to have NM variable affect RM 
configuration. Having a fixed max-allocation is useful to make sure user 
doesn't get more resource than he needed.

 Effective max-allocation-* should consider biggest node
 ---

 Key: YARN-2555
 URL: https://issues.apache.org/jira/browse/YARN-2555
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Wei Yan

 The effective max-allocation-mb should be 
 min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject 
 container requests for resources larger than any node. Today, these requests 
 wait forever. 
 We should do this for all resources and update the effective value on node 
 updates. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2546) REST API for application creation/submission is using strings for numeric boolean values

2014-09-15 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev reassigned YARN-2546:
---

Assignee: Varun Vasudev

 REST API for application creation/submission is using strings for numeric  
 boolean values
 --

 Key: YARN-2546
 URL: https://issues.apache.org/jira/browse/YARN-2546
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.5.1
Reporter: Doug Haigh
Assignee: Varun Vasudev

 When YARN responds with or accepts JSON, numbers  booleans are being 
 represented as strings which can cause parsing problems.
 Resource values look like 
 {
   application-id:application_1404198295326_0001,
   maximum-resource-capability:
{
   memory:8192,
   vCores:32
}
 }
 Instead of
 {
   application-id:application_1404198295326_0001,
   maximum-resource-capability:
{
   memory:8192,
   vCores:32
}
 }
 When I POST to start a job, numeric values are represented as numbers:
   local-resources:
   {
 entry:
 [
   {
 key:AppMaster.jar,
 value:
 {
   
 resource:hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar,
   type:FILE,
   visibility:APPLICATION,
   size: 43004,
   timestamp: 1405452071209
 }
   }
 ]
   },
 Instead of
   local-resources:
   {
 entry:
 [
   {
 key:AppMaster.jar,
 value:
 {
   
 resource:hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar,
   type:FILE,
   visibility:APPLICATION,
   size: 43004,
   timestamp: 1405452071209
 }
   }
 ]
   },
 Similarly, Boolean values are also represented as strings:
 keep-containers-across-application-attempts:false
 Instead of 
 keep-containers-across-application-attempts:false



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2555) Effective max-allocation-* should consider biggest node

2014-09-15 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-2555.
---
Resolution: Duplicate
  Assignee: (was: Wei Yan)

Duplicate of YARN-56.

YARN-394 is related.

 Effective max-allocation-* should consider biggest node
 ---

 Key: YARN-2555
 URL: https://issues.apache.org/jira/browse/YARN-2555
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla

 The effective max-allocation-mb should be 
 min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject 
 container requests for resources larger than any node. Today, these requests 
 wait forever. 
 We should do this for all resources and update the effective value on node 
 updates. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2422) yarn.scheduler.maximum-allocation-mb should not be hard-coded in yarn-default.xml

2014-09-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134896#comment-14134896
 ] 

Vinod Kumar Vavilapalli commented on YARN-2422:
---

It's not just weird, but it's broken on heterogeneous clusters. The right fix 
is a dup of YARN-56.

 yarn.scheduler.maximum-allocation-mb should not be hard-coded in 
 yarn-default.xml
 -

 Key: YARN-2422
 URL: https://issues.apache.org/jira/browse/YARN-2422
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: YARN-2422.1.patch


 Cluster with 40Gb NM refuses to run containers 8Gb.
 It was finally tracked down to yarn-default.xml hard-coding it to 8Gb.
 In case of lack of a better override, it should default to - 
 ${yarn.nodemanager.resource.memory-mb} instead of a hard-coded 8Gb.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2557) Add a parameter attempt_Failures_Validity_Interval in DistributedShell

2014-09-15 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-2557:
---

 Summary: Add a parameter attempt_Failures_Validity_Interval in 
DistributedShell 
 Key: YARN-2557
 URL: https://issues.apache.org/jira/browse/YARN-2557
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong


Change Distributed shell to enable attemptFailuresValidityInterval



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2557) Add a parameter attempt_Failures_Validity_Interval in DistributedShell

2014-09-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2557:

Component/s: applications/distributed-shell

 Add a parameter attempt_Failures_Validity_Interval in DistributedShell 
 -

 Key: YARN-2557
 URL: https://issues.apache.org/jira/browse/YARN-2557
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Xuan Gong
Assignee: Xuan Gong

 Change Distributed shell to enable attemptFailuresValidityInterval



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2557) Add a parameter attempt_Failures_Validity_Interval in DistributedShell

2014-09-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2557:

Attachment: YARN-2557.1.patch

 Add a parameter attempt_Failures_Validity_Interval in DistributedShell 
 -

 Key: YARN-2557
 URL: https://issues.apache.org/jira/browse/YARN-2557
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2557.1.patch


 Change Distributed shell to enable attemptFailuresValidityInterval



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2555) Effective max-allocation-* should consider biggest node

2014-09-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135013#comment-14135013
 ] 

Sandy Ryza commented on YARN-2555:
--

[~gp.leftnoteasy], this isn't the same as having an NM variable affect the RM 
conf.  Considering the effective max allocation as the biggest node means 
rejecting requests that won't fit on any node, which I believe is the correct 
behavior.  The issue I had with YARN-2422 was handling at this at the 
configuration level, rather than properly handling this for heterogeneous 
clusters.

Thanks for pointing that out [~agentvindo.dev] - agreed that this duplicates 
YARN-56.  I think something like the approach outlined here probably makes the 
most sense for that JIRA.

 Effective max-allocation-* should consider biggest node
 ---

 Key: YARN-2555
 URL: https://issues.apache.org/jira/browse/YARN-2555
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Karthik Kambatla

 The effective max-allocation-mb should be 
 min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject 
 container requests for resources larger than any node. Today, these requests 
 wait forever. 
 We should do this for all resources and update the effective value on node 
 updates. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)