[jira] [Updated] (YARN-1250) Generic history service should support application-acls
[ https://issues.apache.org/jira/browse/YARN-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-1250: -- Attachment: YARN-1250.4.patch Upload a new patch to fix the problem of publishing the app's ACLs information. Generic history service should support application-acls --- Key: YARN-1250 URL: https://issues.apache.org/jira/browse/YARN-1250 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: GenericHistoryACLs.pdf, YARN-1250.1.patch, YARN-1250.2.patch, YARN-1250.3.patch, YARN-1250.4.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1250) Generic history service should support application-acls
[ https://issues.apache.org/jira/browse/YARN-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133689#comment-14133689 ] Hadoop QA commented on YARN-1250: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668720/YARN-1250.4.patch against trunk revision fc741b5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4956//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4956//console This message is automatically generated. Generic history service should support application-acls --- Key: YARN-1250 URL: https://issues.apache.org/jira/browse/YARN-1250 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: GenericHistoryACLs.pdf, YARN-1250.1.patch, YARN-1250.2.patch, YARN-1250.3.patch, YARN-1250.4.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1972: --- Attachment: YARN-1972.delta.5.patch Fix the LCE user vs. runAsUser in startLocalizer Implement secure Windows Container Executor --- Key: YARN-1972 URL: https://issues.apache.org/jira/browse/YARN-1972 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch h1. Windows Secure Container Executor (WCE) YARN-1063 adds the necessary infrasturcture to launch a process as a domain user as a solution for the problem of having a security boundary between processes executed in YARN containers and the Hadoop services. The WCE is a container executor that leverages the winutils capabilities introduced in YARN-1063 and launches containers as an OS process running as the job submitter user. A description of the S4U infrastructure used by YARN-1063 alternatives considered can be read on that JIRA. The WCE is based on the DefaultContainerExecutor. It relies on the DCE to drive the flow of execution, but it overwrrides some emthods to the effect of: * change the DCE created user cache directories to be owned by the job user and by the nodemanager group. * changes the actual container run command to use the 'createAsUser' command of winutils task instead of 'create' * runs the localization as standalone process instead of an in-process Java method call. This in turn relies on the winutil createAsUser feature to run the localization as the job user. When compared to LinuxContainerExecutor (LCE), the WCE has some minor differences: * it does no delegate the creation of the user cache directories to the native implementation. * it does no require special handling to be able to delete user files The approach on the WCE came from a practical trial-and-error approach. I had to iron out some issues around the Windows script shell limitations (command line length) to get it to work, the biggest issue being the huge CLASSPATH that is commonplace in Hadoop environment container executions. The job container itself is already dealing with this via a so called 'classpath jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch as a separate container the same issue had to be resolved and I used the same 'classpath jar' approach. h2. Deployment Requirements To use the WCE one needs to set the `yarn.nodemanager.container-executor.class` to `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` and set the `yarn.nodemanager.windows-secure-container-executor.group` to a Windows security group name that is the nodemanager service principal is a member of (equivalent of LCE `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE does not require any configuration outside of the Hadoop own's yar-site.xml. For WCE to work the nodemanager must run as a service principal that is member of the local Administrators group or LocalSystem. this is derived from the need to invoke LoadUserProfile API which mention these requirements in the specifications. This is in addition to the SE_TCB privilege mentioned in YARN-1063, but this requirement will automatically imply that the SE_TCB privilege is held by the nodemanager. For the Linux speakers in the audience, the requirement is basically to run NM as root. h2. Dedicated high privilege Service Due to the high privilege required by the WCE we had discussed the need to isolate the high privilege operations into a separate process, an 'executor' service that is solely responsible to start the containers (incloding the localizer). The NM would have to authenticate, authorize and communicate with this service via an IPC mechanism and use this service to launch the containers. I still believe we'll end up deploying such a service, but the effort to onboard such a new platfrom specific new service on the project are not trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-1972: --- Attachment: YARN-1972.trunk.5.patch trunk diff corresponding to .delta.5 Implement secure Windows Container Executor --- Key: YARN-1972 URL: https://issues.apache.org/jira/browse/YARN-1972 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch h1. Windows Secure Container Executor (WCE) YARN-1063 adds the necessary infrasturcture to launch a process as a domain user as a solution for the problem of having a security boundary between processes executed in YARN containers and the Hadoop services. The WCE is a container executor that leverages the winutils capabilities introduced in YARN-1063 and launches containers as an OS process running as the job submitter user. A description of the S4U infrastructure used by YARN-1063 alternatives considered can be read on that JIRA. The WCE is based on the DefaultContainerExecutor. It relies on the DCE to drive the flow of execution, but it overwrrides some emthods to the effect of: * change the DCE created user cache directories to be owned by the job user and by the nodemanager group. * changes the actual container run command to use the 'createAsUser' command of winutils task instead of 'create' * runs the localization as standalone process instead of an in-process Java method call. This in turn relies on the winutil createAsUser feature to run the localization as the job user. When compared to LinuxContainerExecutor (LCE), the WCE has some minor differences: * it does no delegate the creation of the user cache directories to the native implementation. * it does no require special handling to be able to delete user files The approach on the WCE came from a practical trial-and-error approach. I had to iron out some issues around the Windows script shell limitations (command line length) to get it to work, the biggest issue being the huge CLASSPATH that is commonplace in Hadoop environment container executions. The job container itself is already dealing with this via a so called 'classpath jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch as a separate container the same issue had to be resolved and I used the same 'classpath jar' approach. h2. Deployment Requirements To use the WCE one needs to set the `yarn.nodemanager.container-executor.class` to `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` and set the `yarn.nodemanager.windows-secure-container-executor.group` to a Windows security group name that is the nodemanager service principal is a member of (equivalent of LCE `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE does not require any configuration outside of the Hadoop own's yar-site.xml. For WCE to work the nodemanager must run as a service principal that is member of the local Administrators group or LocalSystem. this is derived from the need to invoke LoadUserProfile API which mention these requirements in the specifications. This is in addition to the SE_TCB privilege mentioned in YARN-1063, but this requirement will automatically imply that the SE_TCB privilege is held by the nodemanager. For the Linux speakers in the audience, the requirement is basically to run NM as root. h2. Dedicated high privilege Service Due to the high privilege required by the WCE we had discussed the need to isolate the high privilege operations into a separate process, an 'executor' service that is solely responsible to start the containers (incloding the localizer). The NM would have to authenticate, authorize and communicate with this service via an IPC mechanism and use this service to launch the containers. I still believe we'll end up deploying such a service, but the effort to onboard such a new platfrom specific new service on the project are not trivial. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2458) Add file handling features to the Windows Secure Container Executor LRPC service
[ https://issues.apache.org/jira/browse/YARN-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu resolved YARN-2458. Resolution: Implemented The patch for YARN-2458 is included in YARN-2198 going forward. Add file handling features to the Windows Secure Container Executor LRPC service Key: YARN-2458 URL: https://issues.apache.org/jira/browse/YARN-2458 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2458.1.patch, YARN-2458.2.patch In the WSCE design the nodemanager needs to do certain privileged operations like change file ownership to arbitrary users or delete files owned by the task container user after completion of the task. As we want to remove the Administrator privilege requirement from the nodemanager service, we have to move these operations into the privileged LRPC helper service. Extend the RPC interface to contain methods for change file ownership and manipulate files, add JNI client side and implement the server side. This will piggyback on the existing LRPC service so is not much infrastructure to add (run as service, RPC init, authentictaion and authorization are already solved). It just needs to be implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: YARN-2198.delta.5.patch Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: YARN-2198.trunk.5.patch Trunk.5 includes YARN-1972 's trunk.5 fix for LCE Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2514) The elevated WSCE LRPC should grant access to the jon to the namenode
[ https://issues.apache.org/jira/browse/YARN-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu resolved YARN-2514. Resolution: Implemented Fix is contained in YARN-2198 4.patch and forward. The job is granted full control to the NM, LocalSystem and the container user. The elevated WSCE LRPC should grant access to the jon to the namenode - Key: YARN-2514 URL: https://issues.apache.org/jira/browse/YARN-2514 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows the job created by wiutils task createAsUser must be accessible/controllable/killable by namenode or winutils task list/kill will fail later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133764#comment-14133764 ] Hadoop QA commented on YARN-2198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668745/YARN-2198.trunk.5.patch against trunk revision fc741b5. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4958//console This message is automatically generated. Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1972) Implement secure Windows Container Executor
[ https://issues.apache.org/jira/browse/YARN-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133788#comment-14133788 ] Hadoop QA commented on YARN-1972: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668739/YARN-1972.trunk.5.patch against trunk revision fc741b5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4957//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4957//console This message is automatically generated. Implement secure Windows Container Executor --- Key: YARN-1972 URL: https://issues.apache.org/jira/browse/YARN-1972 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-1972.1.patch, YARN-1972.2.patch, YARN-1972.3.patch, YARN-1972.delta.4.patch, YARN-1972.delta.5.patch, YARN-1972.trunk.4.patch, YARN-1972.trunk.5.patch h1. Windows Secure Container Executor (WCE) YARN-1063 adds the necessary infrasturcture to launch a process as a domain user as a solution for the problem of having a security boundary between processes executed in YARN containers and the Hadoop services. The WCE is a container executor that leverages the winutils capabilities introduced in YARN-1063 and launches containers as an OS process running as the job submitter user. A description of the S4U infrastructure used by YARN-1063 alternatives considered can be read on that JIRA. The WCE is based on the DefaultContainerExecutor. It relies on the DCE to drive the flow of execution, but it overwrrides some emthods to the effect of: * change the DCE created user cache directories to be owned by the job user and by the nodemanager group. * changes the actual container run command to use the 'createAsUser' command of winutils task instead of 'create' * runs the localization as standalone process instead of an in-process Java method call. This in turn relies on the winutil createAsUser feature to run the localization as the job user. When compared to LinuxContainerExecutor (LCE), the WCE has some minor differences: * it does no delegate the creation of the user cache directories to the native implementation. * it does no require special handling to be able to delete user files The approach on the WCE came from a practical trial-and-error approach. I had to iron out some issues around the Windows script shell limitations (command line length) to get it to work, the biggest issue being the huge CLASSPATH that is commonplace in Hadoop environment container executions. The job container itself is already dealing with this via a so called 'classpath jar', see HADOOP-8899 and YARN-316 for details. For the WCE localizer launch as a separate container the same issue had to be resolved and I used the same 'classpath jar' approach. h2. Deployment Requirements To use the WCE one needs to set the `yarn.nodemanager.container-executor.class` to `org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor` and set the `yarn.nodemanager.windows-secure-container-executor.group` to a Windows security group name that is the nodemanager service principal is a member of (equivalent of LCE `yarn.nodemanager.linux-container-executor.group`). Unlike the LCE the WCE does not require any configuration outside of the Hadoop own's yar-site.xml. For WCE to work the nodemanager must run as a service principal that is member of the local Administrators group or LocalSystem. this is derived from the need to invoke LoadUserProfile API which mention these requirements in
[jira] [Commented] (YARN-2546) REST API for application creation/submission is using strings for numeric boolean values
[ https://issues.apache.org/jira/browse/YARN-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133812#comment-14133812 ] Doug Haigh commented on YARN-2546: -- The definition of the fields does not match the JSON returned. This is a problem for non-Java parsers. Not an insurmountable problem, but a problem. REST API for application creation/submission is using strings for numeric boolean values -- Key: YARN-2546 URL: https://issues.apache.org/jira/browse/YARN-2546 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.5.1 Reporter: Doug Haigh When YARN responds with or accepts JSON, numbers booleans are being represented as strings which can cause parsing problems. Resource values look like { application-id:application_1404198295326_0001, maximum-resource-capability: { memory:8192, vCores:32 } } Instead of { application-id:application_1404198295326_0001, maximum-resource-capability: { memory:8192, vCores:32 } } When I POST to start a job, numeric values are represented as numbers: local-resources: { entry: [ { key:AppMaster.jar, value: { resource:hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar, type:FILE, visibility:APPLICATION, size: 43004, timestamp: 1405452071209 } } ] }, Instead of local-resources: { entry: [ { key:AppMaster.jar, value: { resource:hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar, type:FILE, visibility:APPLICATION, size: 43004, timestamp: 1405452071209 } } ] }, Similarly, Boolean values are also represented as strings: keep-containers-across-application-attempts:false Instead of keep-containers-across-application-attempts:false -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2551) Windows Secure Cotnainer Executor: Add checks to validate that the wsce-site.xml is write restricted to Administrators only
Remus Rusanu created YARN-2551: -- Summary: Windows Secure Cotnainer Executor: Add checks to validate that the wsce-site.xml is write restricted to Administrators only Key: YARN-2551 URL: https://issues.apache.org/jira/browse/YARN-2551 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu The wsce-site.xml containes the impersonate.allowed and impersonate.denied keys that restrict/control the users that can be impersonated by the WSCE containers. The impersonation frameworks in winutils should validate that only Administrators have write control on this file. This is similar to how LCE is validating that only root has write permissions on container-executor.cfg file on secure Linux clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2552) Windows Secure Container Executor: the privileged file operations of hadoopwinutilsvc should be constrained to localdirs only
Remus Rusanu created YARN-2552: -- Summary: Windows Secure Container Executor: the privileged file operations of hadoopwinutilsvc should be constrained to localdirs only Key: YARN-2552 URL: https://issues.apache.org/jira/browse/YARN-2552 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu YARN-2458 added file manipulation operations executed in an elevated context by hadoopwinutilsvc. W/o any constraint, the NM (or a hijacker that takes over the NM) can manipulate arbitrary OS files under highest possible privileges, an easy elevation attack vector. The service should only allow operations on files/directories that are under the configured NM localdirs. It should read this value from wsce-site.xml, as the yarn-site.xml cannot be trusted, being writable by Hadoop admins (YARN-2551 ensures wsce-site.xml is only writable by system Administrators, not Hadoop admins). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2485) Fix WSCE folder/file/classpathJar permission/order when running as non-admin
[ https://issues.apache.org/jira/browse/YARN-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu resolved YARN-2485. Resolution: Duplicate This is fixed by YARN-2458 implementation of a en 'elevated' file system for WSCE. Fix WSCE folder/file/classpathJar permission/order when running as non-admin Key: YARN-2485 URL: https://issues.apache.org/jira/browse/YARN-2485 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows The WSCE creates the local, usercache, filecache appcache dirs in the normal DefaultContainerExecutor way, and then assigns ownership to the userprocess. The WSCE configured group is added, but the permission masks used (710) do no give write permissions on the appcache/filecache/usercache folder to the NM itself. The creation of these folders, as well as the creation of the temporary classPath jar files must succeed even after thes file/dir ownership is relinquished to the task user and the NM does not run as a local Administrator. LCE handles all these dirs inside the container-executor app (root). The classpathJar issue does not exists on Linux. The dirs can be handled by simply delaying the transfer (create all dirs and temp files, then assign ownership in bulk) but the task classpathJar is 'special' and needs some refactoring of the NM launch sequence. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2553) Windows Secure Container Executor: assign PROCESS_TERMINATE privilege to NM on created containers
Remus Rusanu created YARN-2553: -- Summary: Windows Secure Container Executor: assign PROCESS_TERMINATE privilege to NM on created containers Key: YARN-2553 URL: https://issues.apache.org/jira/browse/YARN-2553 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu In order to open a job handle with JOB_OBJECT_TERMINATE access, the caller must have PROCESS_TERMINATE access on the handle of each process in the job (MSDN http://msdn.microsoft.com/en-us/library/windows/desktop/ms686709(v=vs.85).aspx) . hadoopwinutilsvc process should explicitly grant PROCESS_TERMINATE access to NM account on the newly started container process. I hope this gets inherited... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2474) document the wsce-site.xml keys in hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm
[ https://issues.apache.org/jira/browse/YARN-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2474: --- Attachment: YARN-2474.1.patch document the wsce-site.xml keys in hadoop-yarn-site/src/site/apt/SecureContainer.apt.vm --- Key: YARN-2474 URL: https://issues.apache.org/jira/browse/YARN-2474 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Critical Labels: security, windows Attachments: YARN-2474.1.patch document the keys used to configure WSCE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2436) yarn application help doesn't work
[ https://issues.apache.org/jira/browse/YARN-2436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2436: --- Release Note: test yarn application help doesn't work -- Key: YARN-2436 URL: https://issues.apache.org/jira/browse/YARN-2436 Project: Hadoop YARN Issue Type: Bug Components: scripts Reporter: Allen Wittenauer Assignee: Allen Wittenauer Labels: newbie Fix For: 3.0.0 Attachments: YARN-2436.patch The previous version of the yarn command plays games with the command stack for some commands. The new code needs duplicate this wackiness. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Deleted] (YARN-2535) Test JIRA, ignore.
[ https://issues.apache.org/jira/browse/YARN-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer deleted YARN-2535: --- Test JIRA, ignore. -- Key: YARN-2535 URL: https://issues.apache.org/jira/browse/YARN-2535 Project: Hadoop YARN Issue Type: Improvement Reporter: Allen Wittenauer Assignee: Allen Wittenauer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2537) relnotes.py prints description instead of release note for YARN issues
[ https://issues.apache.org/jira/browse/YARN-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-2537. Resolution: Fixed Fixed by INFRA-8338. Manual test shows that relnotes.py is working properly for YARN now. relnotes.py prints description instead of release note for YARN issues -- Key: YARN-2537 URL: https://issues.apache.org/jira/browse/YARN-2537 Project: Hadoop YARN Issue Type: Bug Reporter: Allen Wittenauer Currently, the release notes for YARN always print the description JIRA field instead of the release note. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2537) relnotes.py prints description instead of release note for YARN issues
[ https://issues.apache.org/jira/browse/YARN-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reassigned YARN-2537: -- Assignee: Allen Wittenauer relnotes.py prints description instead of release note for YARN issues -- Key: YARN-2537 URL: https://issues.apache.org/jira/browse/YARN-2537 Project: Hadoop YARN Issue Type: Bug Reporter: Allen Wittenauer Assignee: Allen Wittenauer Currently, the release notes for YARN always print the description JIRA field instead of the release note. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is enabled as the HTTP policy
Jonathan Maron created YARN-2554: Summary: Slider AM Web UI is inaccessible if HTTPS/SSL is enabled as the HTTP policy Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Maron updated YARN-2554: - Summary: Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy (was: Slider AM Web UI is inaccessible if HTTPS/SSL is enabled as the HTTP policy) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is enabled as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133934#comment-14133934 ] Jonathan Maron commented on YARN-2554: -- A workaround (though not necessarily a production recommended one) is to add the client trust store certs to the the JDK's cacerts file (export the trust store certs, import them to JDK/jre/lib/security/cacerts) Slider AM Web UI is inaccessible if HTTPS/SSL is enabled as the HTTP policy --- Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2531) CGroups - Admins should be allowed to enforce strict cpu limits
[ https://issues.apache.org/jira/browse/YARN-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134020#comment-14134020 ] Varun Vasudev commented on YARN-2531: - Similar but not the same. YARN-810 allows apps to choose to limit themselves. This allows admins to enforce limits irrespective of the app. CGroups - Admins should be allowed to enforce strict cpu limits --- Key: YARN-2531 URL: https://issues.apache.org/jira/browse/YARN-2531 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2531.0.patch From YARN-2440 - {quote} The other dimension to this is determinism w.r.t performance. Limiting to allocated cores overall (as well as per container later) helps orgs run workloads and reason about them deterministically. One of the examples is benchmarking apps, but deterministic execution is a desired option beyond benchmarks too. {quote} It would be nice to have an option to let admins to enforce strict cpu limits for apps for things like benchmarking, etc. By default this flag should be off so that containers can use available cpu but admin can turn the flag on to determine worst case performance, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
[ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134024#comment-14134024 ] Maysam Yabandeh commented on YARN-1530: --- bq. YARN apps already depend on ZK/RM/HDFS being up. Every new service dependency we add will only increase the chances of YARN apps failing or slowing down. That's true even if the ATS service's uptime is as good as ZK or RM. bq. Realistically, getting the ATS service's uptime to the same level as ZK or HDFS is a long and winding road. Especially when most discussions here assume HBase as the backing store. HBase's uptime is lower than HDFS/ZK/RM because it's more complex to operate. If HBase going down means ATS service going down, then we certainly should guard against this failure scenario. +1 bq. And if we have a choice to decouple the write path from the ATS service, why not? bq. If we have an alternate code path to persist events first before they hit the final backing store, why not do that all the time? I would call that a reasonable approach. One alternative also is to use HDFS as the backup plan, i.e., use it when HBase is down. Anyway, having ATS pluggable I guess all approaches can grow independently. [Umbrella] Store, manage and serve per-framework application-timeline data -- Key: YARN-1530 URL: https://issues.apache.org/jira/browse/YARN-1530 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, application timeline design-20140116.pdf, application timeline design-20140130.pdf, application timeline design-20140210.pdf This is a sibling JIRA for YARN-321. Today, each application/framework has to do store, and serve per-framework data all by itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management and serving of per-framework data from various applications, both running and finished. The aim is to change YARN to collect and store data in a generic manner with plugin points for frameworks to do their own thing w.r.t interpretation and serving. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2468: Attachment: YARN-2468.3.rebase.patch create the patch based on the latest trunk Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2529) Generic history service RPC interface doesn't work when service authorization is enabled
[ https://issues.apache.org/jira/browse/YARN-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134178#comment-14134178 ] Jian He commented on YARN-2529: --- +1 Generic history service RPC interface doesn't work when service authorization is enabled Key: YARN-2529 URL: https://issues.apache.org/jira/browse/YARN-2529 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2529.1.patch, YARN-2529.2.patch Here's the problem shown in the log: {code} 14/09/10 10:42:44 INFO ipc.Server: Connection from 10.22.2.109:55439 for protocol org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB is unauthorized for user zshen (auth:SIMPLE) 14/09/10 10:42:44 INFO ipc.Server: Socket Reader #1 for port 10200: readAndProcess from client 10.22.2.109 threw exception [org.apache.hadoop.security.authorize.AuthorizationException: Protocol interface org.apache.hadoop.yarn.api.ApplicationHistoryProtocolPB is not known.] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2516) Deprecate yarn.policy.file
[ https://issues.apache.org/jira/browse/YARN-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-2516. Resolution: Duplicate Looks like HADOOP-9902 wiped out most of yarn.policy.file already, so the only remaining bit is in the yarn-env.sh. That is easier to cleanup as aprt of YARN-2438. Deprecate yarn.policy.file -- Key: YARN-2516 URL: https://issues.apache.org/jira/browse/YARN-2516 Project: Hadoop YARN Issue Type: Improvement Components: scripts Reporter: Allen Wittenauer Labels: newbie It doesn't appear that yarn.policy.file is actually used anywhere, there isn't an example yarn-policy.xml file, etc, etc. So let's remove it from the shell code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI
[ https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-2540: - Attachment: YARN-2540-v2.txt There is a case in which the filter can return wrong results : Say apps running on root.a.b and root.a.b1 Clicking on root.a.b would return apps running in both b and b1, instead of only b. v2 patch corrects this. Fair Scheduler : queue filters not working on scheduler page in RM UI - Key: YARN-2540 URL: https://issues.apache.org/jira/browse/YARN-2540 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0, 2.5.1 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt Steps to reproduce : 1. Run an app in default queue. 2. While the app is running, go to the scheduler page on RM UI. 3. You would see the app in the apptable at the bottom. 4. Now click on default queue to filter the apptable on root.default. 5. App disappears from apptable although it is running on default queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2438) yarn-env.sh cleanup
[ https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134258#comment-14134258 ] Allen Wittenauer commented on YARN-2438: Ofc, HADOOP-10950 would make heap management much more obvious. yarn-env.sh cleanup --- Key: YARN-2438 URL: https://issues.apache.org/jira/browse/YARN-2438 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Labels: newbie a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented b) Defaults should get moved to yarn-config.sh instead of being specifically set -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2468) Log handling for LRS
[ https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134257#comment-14134257 ] Hadoop QA commented on YARN-2468: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668794/YARN-2468.3.rebase.patch against trunk revision 24d920b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4959//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4959//console This message is automatically generated. Log handling for LRS Key: YARN-2468 URL: https://issues.apache.org/jira/browse/YARN-2468 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, YARN-2468.3.rebase.patch Currently, when application is finished, NM will start to do the log aggregation. But for Long running service applications, this is not ideal. The problems we have are: 1) LRS applications are expected to run for a long time (weeks, months). 2) Currently, all the container logs (from one NM) will be written into a single file. The files could become larger and larger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2438) yarn-env.sh cleanup
[ https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2438: --- Description: a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented b) Defaults should get moved to yarn-config.sh instead of being specifically set c) Remove references to things that are covered elsewhere, deprecated, etc. was: a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented b) Defaults should get moved to yarn-config.sh instead of being specifically set yarn-env.sh cleanup --- Key: YARN-2438 URL: https://issues.apache.org/jira/browse/YARN-2438 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Labels: newbie a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented b) Defaults should get moved to yarn-config.sh instead of being specifically set c) Remove references to things that are covered elsewhere, deprecated, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2438) yarn-env.sh cleanup
[ https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2438: --- Attachment: YARN-2438.patch yarn-env.sh cleanup --- Key: YARN-2438 URL: https://issues.apache.org/jira/browse/YARN-2438 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Labels: newbie Attachments: YARN-2438.patch a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented b) Defaults should get moved to yarn-config.sh instead of being specifically set c) Remove references to things that are covered elsewhere, deprecated, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2438) yarn-env.sh cleanup
[ https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134295#comment-14134295 ] Allen Wittenauer commented on YARN-2438: If HADOOP-10950 goes in first, this patch needs to updated for it. yarn-env.sh cleanup --- Key: YARN-2438 URL: https://issues.apache.org/jira/browse/YARN-2438 Project: Hadoop YARN Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Labels: newbie Attachments: YARN-2438.patch a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented b) Defaults should get moved to yarn-config.sh instead of being specifically set c) Remove references to things that are covered elsewhere, deprecated, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2549) TestContainerLaunch fails due to classpath problem with hamcrest classes.
[ https://issues.apache.org/jira/browse/YARN-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134303#comment-14134303 ] Arpit Agarwal commented on YARN-2549: - +1 for the patch. TestContainerLaunch fails due to classpath problem with hamcrest classes. - Key: YARN-2549 URL: https://issues.apache.org/jira/browse/YARN-2549 Project: Hadoop YARN Issue Type: Test Components: nodemanager, test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: YARN-2549.1.patch The mockito jar bundles its own copy of the hamcrest classes, and it's ahead of our hamcrest dependency jar on the test classpath for hadoop-yarn-server-nodemanager. Unfortunately, the version bundled in mockito doesn't match the version we need, so it's missing the {{CoreMatchers#containsString}} method. This causes the tests to fail with {{NoSuchMethodError}} on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2437) start-yarn.sh/stop-yarn needs to give info
[ https://issues.apache.org/jira/browse/YARN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2437: --- Issue Type: Improvement (was: Bug) start-yarn.sh/stop-yarn needs to give info -- Key: YARN-2437 URL: https://issues.apache.org/jira/browse/YARN-2437 Project: Hadoop YARN Issue Type: Improvement Components: scripts Reporter: Allen Wittenauer Assignee: Hao Gao Labels: newbie With the merger and cleanup of the daemon launch code, yarn-daemons.sh no longer prints Starting information. This should be made more of an analog of start-dfs.sh/stop-dfs.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2437) start-yarn.sh/stop-yarn should give info
[ https://issues.apache.org/jira/browse/YARN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2437: --- Summary: start-yarn.sh/stop-yarn should give info (was: start-yarn.sh/stop-yarn needs to give info) start-yarn.sh/stop-yarn should give info Key: YARN-2437 URL: https://issues.apache.org/jira/browse/YARN-2437 Project: Hadoop YARN Issue Type: Improvement Components: scripts Reporter: Allen Wittenauer Assignee: Hao Gao Labels: newbie With the merger and cleanup of the daemon launch code, yarn-daemons.sh no longer prints Starting information. This should be made more of an analog of start-dfs.sh/stop-dfs.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2438) yarn-env.sh cleanup
[ https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated YARN-2438: --- Issue Type: Improvement (was: Bug) yarn-env.sh cleanup --- Key: YARN-2438 URL: https://issues.apache.org/jira/browse/YARN-2438 Project: Hadoop YARN Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Labels: newbie Attachments: YARN-2438.patch a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented b) Defaults should get moved to yarn-config.sh instead of being specifically set c) Remove references to things that are covered elsewhere, deprecated, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2102) More generalized timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134325#comment-14134325 ] Li Lu commented on YARN-2102: - Hi [~zjshen], just a quick thing to check, shall we use the lock map in the existing leveldbstore here? Seems like some operations need to acquire locks? More generalized timeline ACLs -- Key: YARN-2102 URL: https://issues.apache.org/jira/browse/YARN-2102 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch, YARN-2102.2.patch, YARN-2102.3.patch, YARN-2102.5.patch We need to differentiate the access controls of reading and writing operations, and we need to think about cross-entity access control. For example, if we are executing a workflow of MR jobs, which writing the timeline data of this workflow, we don't want other user to pollute the timeline data of the workflow by putting something under it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2549) TestContainerLaunch fails due to classpath problem with hamcrest classes.
[ https://issues.apache.org/jira/browse/YARN-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated YARN-2549: Hadoop Flags: Reviewed Thank you, Arpit. I committed this to trunk and branch-2. TestContainerLaunch fails due to classpath problem with hamcrest classes. - Key: YARN-2549 URL: https://issues.apache.org/jira/browse/YARN-2549 Project: Hadoop YARN Issue Type: Test Components: nodemanager, test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Attachments: YARN-2549.1.patch The mockito jar bundles its own copy of the hamcrest classes, and it's ahead of our hamcrest dependency jar on the test classpath for hadoop-yarn-server-nodemanager. Unfortunately, the version bundled in mockito doesn't match the version we need, so it's missing the {{CoreMatchers#containsString}} method. This causes the tests to fail with {{NoSuchMethodError}} on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2549) TestContainerLaunch fails due to classpath problem with hamcrest classes.
[ https://issues.apache.org/jira/browse/YARN-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2549: -- Fix Version/s: 2.6.0 TestContainerLaunch fails due to classpath problem with hamcrest classes. - Key: YARN-2549 URL: https://issues.apache.org/jira/browse/YARN-2549 Project: Hadoop YARN Issue Type: Test Components: nodemanager, test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Fix For: 2.6.0 Attachments: YARN-2549.1.patch The mockito jar bundles its own copy of the hamcrest classes, and it's ahead of our hamcrest dependency jar on the test classpath for hadoop-yarn-server-nodemanager. Unfortunately, the version bundled in mockito doesn't match the version we need, so it's missing the {{CoreMatchers#containsString}} method. This causes the tests to fail with {{NoSuchMethodError}} on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2438) yarn-env.sh cleanup
[ https://issues.apache.org/jira/browse/YARN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134349#comment-14134349 ] Hadoop QA commented on YARN-2438: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668821/YARN-2438.patch against trunk revision 43b0303. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4961//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4961//console This message is automatically generated. yarn-env.sh cleanup --- Key: YARN-2438 URL: https://issues.apache.org/jira/browse/YARN-2438 Project: Hadoop YARN Issue Type: Improvement Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Labels: newbie Attachments: YARN-2438.patch a) YARN_PROXYSERVER_OPTS and YARN_PROXYSERVER_HEAP are not documented b) Defaults should get moved to yarn-config.sh instead of being specifically set c) Remove references to things that are covered elsewhere, deprecated, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI
[ https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134367#comment-14134367 ] Hadoop QA commented on YARN-2540: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668815/YARN-2540-v2.txt against trunk revision 43b0303. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4960//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4960//console This message is automatically generated. Fair Scheduler : queue filters not working on scheduler page in RM UI - Key: YARN-2540 URL: https://issues.apache.org/jira/browse/YARN-2540 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0, 2.5.1 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt Steps to reproduce : 1. Run an app in default queue. 2. While the app is running, go to the scheduler page on RM UI. 3. You would see the app in the apptable at the bottom. 4. Now click on default queue to filter the apptable on root.default. 5. App disappears from apptable although it is running on default queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2555) Effective max-allocation-* should consider biggest node
Karthik Kambatla created YARN-2555: -- Summary: Effective max-allocation-* should consider biggest node Key: YARN-2555 URL: https://issues.apache.org/jira/browse/YARN-2555 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla The effective max-allocation-mb should be min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject container requests for resources larger than any node. Today, these requests wait forever. We should do this for all resources and update the effective value on node updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2555) Effective max-allocation-* should consider biggest node
[ https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan reassigned YARN-2555: - Assignee: Wei Yan Effective max-allocation-* should consider biggest node --- Key: YARN-2555 URL: https://issues.apache.org/jira/browse/YARN-2555 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Wei Yan The effective max-allocation-mb should be min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject container requests for resources larger than any node. Today, these requests wait forever. We should do this for all resources and update the effective value on node updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2556) Tool to measure the performance of the timeline server
Jonathan Eagles created YARN-2556: - Summary: Tool to measure the performance of the timeline server Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1779) Handle AMRMTokens across RM failover
[ https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-1779: - Assignee: Jian He Handle AMRMTokens across RM failover Key: YARN-1779 URL: https://issues.apache.org/jira/browse/YARN-1779 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Jian He Priority: Blocker Labels: ha Verify if AMRMTokens continue to work against RM failover. If not, we will have to do something along the lines of YARN-986. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-218) Distiguish between failed and killed app attempts
[ https://issues.apache.org/jira/browse/YARN-218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White resolved YARN-218. Resolution: Duplicate Fixed in YARN-614. Distiguish between failed and killed app attempts - Key: YARN-218 URL: https://issues.apache.org/jira/browse/YARN-218 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Tom White Assignee: Tom White A failed app attempt is one that failed due to an error in the user program, as opposed to one that was killed by the system. Like in MapReduce task attempts, we should distinguish the two so that killed attempts do not count against the number of retries (yarn.resourcemanager.am.max-retries). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134542#comment-14134542 ] Jonathan Eagles commented on YARN-2556: --- This jira is to give users realistic performance numbers running the timeline server for their setup specific to their own hardware. (HBase, leveldb, etc) FYI. LevelDB publishes their own performance statics. https://code.google.com/p/leveldb/ Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2312: -- Target Version/s: 2.6.0 We SHOULD try to get this in 2.6, marking so.. Marking ContainerId#getId as deprecated --- Key: YARN-2312 URL: https://issues.apache.org/jira/browse/YARN-2312 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA {{ContainerId#getId}} will only return partial value of containerId, only sequence number of container id without epoch, after YARN-2229. We should mark {{ContainerId#getId}} as deprecated and use {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134598#comment-14134598 ] Vinod Kumar Vavilapalli commented on YARN-2080: --- ClientRMService: checkReservationSytem() logs for every new reservation requests if reservations are not enabled. That is too much logging. AbstractReservationSystem: Actual start of the thread pool shouldn't be in serviceInit() Missed these - reservation.plan.follower - reservation-system.plan-follower - reservation.planfollower.time-step - reservation-system.plan-follower.time-step Admission Control: Integrate Reservation subsystem with ResourceManager --- Key: YARN-2080 URL: https://issues.apache.org/jira/browse/YARN-2080 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Subru Krishnan Assignee: Subru Krishnan Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch This JIRA tracks the integration of Reservation subsystem data structures introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring of YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134693#comment-14134693 ] Carlo Curino commented on YARN-1711: [~chris.douglas] thanks for the prompt and precise feedback. I addressed them in the updated patch (v4) as follows: * added comments and implemented nits as requested * got rid of excludeList altogether, as this is not necessary anymore given restructuring done in th e ReservationSystem (i.e., the exclusion list is now handled outside the scope of the policies which is cleaner/simpler) * improved tests, by using subclasses of exception wherever possible, and scoping them so that it is more telling that a failure is what we wanted to be. * subclasses of PlanningException give callers some indication of why the call did not succeed (e.g., enough for the tests), future smarter agents might require more detailed explanation (e.g., the JSON payload you mention) * used the (expected = SomeException.class) notation for tests. Regarding annotations I am using @Public @Unstable for the exceptions, as they can bubble up all the way to users, while @LimitedPrivate(yarn) @Unstable for the other classes. If anyone has better suggestions for the annotations please advise. CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carlo Curino updated YARN-1711: --- Attachment: YARN-1711.4.patch CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, YARN-1711.4.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134714#comment-14134714 ] Hadoop QA commented on YARN-1711: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668909/YARN-1711.4.patch against trunk revision 8008f0e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4962//console This message is automatically generated. CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, YARN-1711.4.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2446) Using TimelineNamespace to shield the entities of a user
[ https://issues.apache.org/jira/browse/YARN-2446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134715#comment-14134715 ] Li Lu commented on YARN-2446: - Hi [~zjshen], I applied the patch on top of a YARN-2102 branch, and ran all system tests changed in your patch. They all passed. On the code side, in generally it looks good to me. Here are some comments: {code} // create a default namespace, which allows everybody to access and // modify the entities in it. namespace = new TimelineNamespace(); namespace.setId(DEFAULT_NAMESPACE_ID); namespace.setDescription(System Default Namespace); namespace.setOwner( UserGroupInformation.getCurrentUser().getShortUserName()); namespace.setReaders(*); namespace.setWriters(*); {code} I would like to confirm that it is fine to set the owner of default namespace to current user. Since this is a lazy initialization, the owner of the default namespace is not deterministic. Will this cause any troubles in future? {code} throw new YarnException(The namespace of the timeline entity + entityID + is not allowed to be changed.); {code} Could you please verify if this exception only represent the case when the user tries to change the namespace of the entity? Is it possible to have a scenario where the user is not changing the namespace, but just set it wrong? If this scenario is possible, maybe we want to change the exception message since it may be a little bit confusing. {code} public void invalidate(TimelineNamespace namespace) { if (aclExts.containsKey(namespace.getId())) { putNamespaceIntoCache(namespace); } } {code} When this function is called, it would be reasonable for the user to expect the cached item is invalidated in the cache. However, here we're actually updating it. Maybe we want to change the name of this function? Thanks! Using TimelineNamespace to shield the entities of a user Key: YARN-2446 URL: https://issues.apache.org/jira/browse/YARN-2446 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2446.1.patch Given YARN-2102 adds TimelineNamespace, we can make use of it to shield the entities, preventing them from being accessed or affected by other users' operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134718#comment-14134718 ] Chris Douglas commented on YARN-1711: - +1 Thanks for addressing the feedback on the patch CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 -- Key: YARN-1711 URL: https://issues.apache.org/jira/browse/YARN-1711 Project: Hadoop YARN Issue Type: Sub-task Reporter: Carlo Curino Assignee: Carlo Curino Labels: reservations Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, YARN-1711.4.patch, YARN-1711.patch This JIRA tracks the development of a policy that enforces user quotas (a time-extension of the notion of capacity) in the inventory subsystem discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI
[ https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated YARN-2540: - Attachment: YARN-2540-v3.txt Updated the patch to take care of same prefix problem at parent queue. Fair Scheduler : queue filters not working on scheduler page in RM UI - Key: YARN-2540 URL: https://issues.apache.org/jira/browse/YARN-2540 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0, 2.5.1 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt, YARN-2540-v3.txt Steps to reproduce : 1. Run an app in default queue. 2. While the app is running, go to the scheduler page on RM UI. 3. You would see the app in the apptable at the bottom. 4. Now click on default queue to filter the apptable on root.default. 5. App disappears from apptable although it is running on default queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart
[ https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134762#comment-14134762 ] Anubhav Dhoot commented on YARN-1372: - A finishedContainer that was sent to previous AM will have to be sent again to the new AM inorder to get the ack. So we need to transfer the finishedContainersSentToAM from previous attempt to the justFinishedContainers of the new attempt (if we decide to transfer those). Then why not also transfer the justFinishedContainers as well? If we are going to not consider whether work-preserving AM restart is enabled for this, we should be consistent whether we transfer justFinishedContainers and finishedContainersSentToAM (either both or none). Agree? Ensure all completed containers are reported to the AMs across RM restart - Key: YARN-1372 URL: https://issues.apache.org/jira/browse/YARN-1372 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1372.001.patch, YARN-1372.001.patch, YARN-1372.002_NMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.002_RMHandlesCompletedApp.patch, YARN-1372.003.patch, YARN-1372.004.patch, YARN-1372.005.patch, YARN-1372.005.patch, YARN-1372.prelim.patch, YARN-1372.prelim2.patch Currently the NM informs the RM about completed containers and then removes those containers from the RM notification list. The RM passes on that completed container information to the AM and the AM pulls this data. If the RM dies before the AM pulls this data then the AM may not be able to get this information again. To fix this, NM should maintain a separate list of such completed container notifications sent to the RM. After the AM has pulled the containers from the RM then the RM will inform the NM about it and the NM can remove the completed container from the new list. Upon re-register with the RM (after RM restart) the NM should send the entire list of completed containers to the RM along with any other containers that completed while the RM was dead. This ensures that the RM can inform the AM's about all completed containers. Some container completions may be reported more than once since the AM may have pulled the container but the RM may die before notifying the NM about the pull. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2540) Fair Scheduler : queue filters not working on scheduler page in RM UI
[ https://issues.apache.org/jira/browse/YARN-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134791#comment-14134791 ] Hadoop QA commented on YARN-2540: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668912/YARN-2540-v3.txt against trunk revision 0ac760a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4963//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4963//console This message is automatically generated. Fair Scheduler : queue filters not working on scheduler page in RM UI - Key: YARN-2540 URL: https://issues.apache.org/jira/browse/YARN-2540 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.5.0, 2.5.1 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: YARN-2540-v1.txt, YARN-2540-v2.txt, YARN-2540-v3.txt Steps to reproduce : 1. Run an app in default queue. 2. While the app is running, go to the scheduler page on RM UI. 3. You would see the app in the apptable at the bottom. 4. Now click on default queue to filter the apptable on root.default. 5. App disappears from apptable although it is running on default queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2102) More generalized timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2102: -- Attachment: YARN-2102.6.patch More generalized timeline ACLs -- Key: YARN-2102 URL: https://issues.apache.org/jira/browse/YARN-2102 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch, YARN-2102.2.patch, YARN-2102.3.patch, YARN-2102.5.patch, YARN-2102.6.patch We need to differentiate the access controls of reading and writing operations, and we need to think about cross-entity access control. For example, if we are executing a workflow of MR jobs, which writing the timeline data of this workflow, we don't want other user to pollute the timeline data of the workflow by putting something under it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2102) More generalized timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134820#comment-14134820 ] Zhijie Shen commented on YARN-2102: --- bq. From your design doc, I think what we're proposing here is something to partition the domain of entities, but not enhancing identifications. Maybe we want to consider an alternative name like domain or partition here? Talked to [~gtCarrera] offline. It seems to be a good suggestion, and we don't plan of nested namespaces actually. Change it to domain in the new patch. bq. This is significantly different to any other fields. Are there any specific considerations behind this? I follow the way that we put the start time and the insert time of an entity. It doesn't make much difference to split them and associate them with different keys, and they are usually retrieved together. I added more code comments to describe it. bq. I think this is left out for some reasons, and maybe in YARN-2446 you're addressing this? Yes, the use of the domain acls is in YARN-2446 bq. Shall we add a default branch here to track any potential problems? Added an else block here. bq. just a quick thing to check, shall we use the lock map in the existing leveldbstore here? Seems like some operations need to acquire locks? According to the offline discussion, the currently lock have some obvious issues. Let's fix it in a separate Jira. More generalized timeline ACLs -- Key: YARN-2102 URL: https://issues.apache.org/jira/browse/YARN-2102 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch, YARN-2102.2.patch, YARN-2102.3.patch, YARN-2102.5.patch, YARN-2102.6.patch We need to differentiate the access controls of reading and writing operations, and we need to think about cross-entity access control. For example, if we are executing a workflow of MR jobs, which writing the timeline data of this workflow, we don't want other user to pollute the timeline data of the workflow by putting something under it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2102) More generalized timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134843#comment-14134843 ] Hadoop QA commented on YARN-2102: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668939/YARN-2102.6.patch against trunk revision 932ae03. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4964//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4964//console This message is automatically generated. More generalized timeline ACLs -- Key: YARN-2102 URL: https://issues.apache.org/jira/browse/YARN-2102 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: GeneralizedTimelineACLs.pdf, YARN-2102.1.patch, YARN-2102.2.patch, YARN-2102.3.patch, YARN-2102.5.patch, YARN-2102.6.patch We need to differentiate the access controls of reading and writing operations, and we need to think about cross-entity access control. For example, if we are executing a workflow of MR jobs, which writing the timeline data of this workflow, we don't want other user to pollute the timeline data of the workflow by putting something under it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2555) Effective max-allocation-* should consider biggest node
[ https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134854#comment-14134854 ] Varun Vasudev commented on YARN-2555: - Duplicate of YARN-2422? Effective max-allocation-* should consider biggest node --- Key: YARN-2555 URL: https://issues.apache.org/jira/browse/YARN-2555 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Wei Yan The effective max-allocation-mb should be min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject container requests for resources larger than any node. Today, these requests wait forever. We should do this for all resources and update the effective value on node updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2555) Effective max-allocation-* should consider biggest node
[ https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134868#comment-14134868 ] Wangda Tan commented on YARN-2555: -- I think they're different, proposal of YARN-2422 is making a flexible max-allocation. And this JIRA focus on reject ResourceRequest when its larger than biggest node in the cluster. IMHO, we don't need to both of them, it is very possible nodes connect to RM after application submitted, especially in virtual cluster environment. And as [~sandyr] commented, it's weird to have NM variable affect RM configuration. Having a fixed max-allocation is useful to make sure user doesn't get more resource than he needed. Effective max-allocation-* should consider biggest node --- Key: YARN-2555 URL: https://issues.apache.org/jira/browse/YARN-2555 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Wei Yan The effective max-allocation-mb should be min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject container requests for resources larger than any node. Today, these requests wait forever. We should do this for all resources and update the effective value on node updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2546) REST API for application creation/submission is using strings for numeric boolean values
[ https://issues.apache.org/jira/browse/YARN-2546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev reassigned YARN-2546: --- Assignee: Varun Vasudev REST API for application creation/submission is using strings for numeric boolean values -- Key: YARN-2546 URL: https://issues.apache.org/jira/browse/YARN-2546 Project: Hadoop YARN Issue Type: Bug Components: api Affects Versions: 2.5.1 Reporter: Doug Haigh Assignee: Varun Vasudev When YARN responds with or accepts JSON, numbers booleans are being represented as strings which can cause parsing problems. Resource values look like { application-id:application_1404198295326_0001, maximum-resource-capability: { memory:8192, vCores:32 } } Instead of { application-id:application_1404198295326_0001, maximum-resource-capability: { memory:8192, vCores:32 } } When I POST to start a job, numeric values are represented as numbers: local-resources: { entry: [ { key:AppMaster.jar, value: { resource:hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar, type:FILE, visibility:APPLICATION, size: 43004, timestamp: 1405452071209 } } ] }, Instead of local-resources: { entry: [ { key:AppMaster.jar, value: { resource:hdfs://hdfs-namenode:9000/user/testuser/DistributedShell/demo-app/AppMaster.jar, type:FILE, visibility:APPLICATION, size: 43004, timestamp: 1405452071209 } } ] }, Similarly, Boolean values are also represented as strings: keep-containers-across-application-attempts:false Instead of keep-containers-across-application-attempts:false -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2555) Effective max-allocation-* should consider biggest node
[ https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-2555. --- Resolution: Duplicate Assignee: (was: Wei Yan) Duplicate of YARN-56. YARN-394 is related. Effective max-allocation-* should consider biggest node --- Key: YARN-2555 URL: https://issues.apache.org/jira/browse/YARN-2555 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla The effective max-allocation-mb should be min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject container requests for resources larger than any node. Today, these requests wait forever. We should do this for all resources and update the effective value on node updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2422) yarn.scheduler.maximum-allocation-mb should not be hard-coded in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14134896#comment-14134896 ] Vinod Kumar Vavilapalli commented on YARN-2422: --- It's not just weird, but it's broken on heterogeneous clusters. The right fix is a dup of YARN-56. yarn.scheduler.maximum-allocation-mb should not be hard-coded in yarn-default.xml - Key: YARN-2422 URL: https://issues.apache.org/jira/browse/YARN-2422 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.6.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: YARN-2422.1.patch Cluster with 40Gb NM refuses to run containers 8Gb. It was finally tracked down to yarn-default.xml hard-coding it to 8Gb. In case of lack of a better override, it should default to - ${yarn.nodemanager.resource.memory-mb} instead of a hard-coded 8Gb. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2557) Add a parameter attempt_Failures_Validity_Interval in DistributedShell
Xuan Gong created YARN-2557: --- Summary: Add a parameter attempt_Failures_Validity_Interval in DistributedShell Key: YARN-2557 URL: https://issues.apache.org/jira/browse/YARN-2557 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Change Distributed shell to enable attemptFailuresValidityInterval -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2557) Add a parameter attempt_Failures_Validity_Interval in DistributedShell
[ https://issues.apache.org/jira/browse/YARN-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2557: Component/s: applications/distributed-shell Add a parameter attempt_Failures_Validity_Interval in DistributedShell - Key: YARN-2557 URL: https://issues.apache.org/jira/browse/YARN-2557 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Xuan Gong Assignee: Xuan Gong Change Distributed shell to enable attemptFailuresValidityInterval -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2557) Add a parameter attempt_Failures_Validity_Interval in DistributedShell
[ https://issues.apache.org/jira/browse/YARN-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2557: Attachment: YARN-2557.1.patch Add a parameter attempt_Failures_Validity_Interval in DistributedShell - Key: YARN-2557 URL: https://issues.apache.org/jira/browse/YARN-2557 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2557.1.patch Change Distributed shell to enable attemptFailuresValidityInterval -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2555) Effective max-allocation-* should consider biggest node
[ https://issues.apache.org/jira/browse/YARN-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135013#comment-14135013 ] Sandy Ryza commented on YARN-2555: -- [~gp.leftnoteasy], this isn't the same as having an NM variable affect the RM conf. Considering the effective max allocation as the biggest node means rejecting requests that won't fit on any node, which I believe is the correct behavior. The issue I had with YARN-2422 was handling at this at the configuration level, rather than properly handling this for heterogeneous clusters. Thanks for pointing that out [~agentvindo.dev] - agreed that this duplicates YARN-56. I think something like the approach outlined here probably makes the most sense for that JIRA. Effective max-allocation-* should consider biggest node --- Key: YARN-2555 URL: https://issues.apache.org/jira/browse/YARN-2555 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Karthik Kambatla The effective max-allocation-mb should be min(admin-configured-max-allocation-mb, max-mb-on-one-node), so we can reject container requests for resources larger than any node. Today, these requests wait forever. We should do this for all resources and update the effective value on node updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)