[jira] [Commented] (YARN-668) TokenIdentifier serialization should consider Unknown fields
[ https://issues.apache.org/jira/browse/YARN-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141791#comment-14141791 ] Hadoop QA commented on YARN-668: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12670165/YARN-668.patch against trunk revision f85cc14. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 22 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.nodemanager.containermanager.application.TestApplication org.apache.hadoop.yarn.server.nodemanager.containermanager.container.TestContainer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5058//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5058//artifact/PreCommit-HADOOP-Build-patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5058//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5058//console This message is automatically generated. TokenIdentifier serialization should consider Unknown fields Key: YARN-668 URL: https://issues.apache.org/jira/browse/YARN-668 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Junping Du Priority: Blocker Attachments: YARN-668-demo.patch, YARN-668.patch This would allow changing of the TokenIdentifier between versions. The current serialization is Writable. A simple way to achieve this would be to have a Proto object as the payload for TokenIdentifiers, instead of individual fields. TokenIdentifier continues to implement Writable to work with the RPC layer - but the payload itself is serialized using PB. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2565) RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore
[ https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141931#comment-14141931 ] Hudson commented on YARN-2565: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #686 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/686/]) YARN-2565. Fixed RM to not use FileSystemApplicationHistoryStore unless explicitly set. Contributed by Zhijie Shen (jianhe: rev 444acf8ea795e4bc782f1ce3b5ef7a1a47d1d27d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore --- Key: YARN-2565 URL: https://issues.apache.org/jira/browse/YARN-2565 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Secure cluster with ATS (timeline server enabled) and yarn.resourcemanager.system-metrics-publisher.enabled=true so that RM can send Application history to Timeline Store Reporter: Karam Singh Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2565.1.patch, YARN-2565.2.patch, YARN-2565.3.patch Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141930#comment-14141930 ] Hudson commented on YARN-2460: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #686 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/686/]) YARN-2460. Remove obsolete entries from yarn-default.xml (Ray Chiang via aw) (aw: rev aa1052c34b78b5b8b6a1498c8c842d21b07fceca) * hadoop-tools/hadoop-sls/src/main/data/2jobs2min-rumen-jh.json * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_1329348432655_0001_conf.xml Remove obsolete entries from yarn-default.xml - Key: YARN-2460 URL: https://issues.apache.org/jira/browse/YARN-2460 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: newbie Fix For: 2.6.0 Attachments: YARN-2460-01.patch, YARN-2460-02.patch The following properties are defined in yarn-default.xml, but do not exist in YarnConfiguration. mapreduce.job.hdfs-servers mapreduce.job.jar yarn.ipc.exception.factory.class yarn.ipc.serializer.type yarn.nodemanager.aux-services.mapreduce_shuffle.class yarn.nodemanager.hostname yarn.nodemanager.resourcemanager.connect.retry_interval.secs yarn.nodemanager.resourcemanager.connect.wait.secs yarn.resourcemanager.amliveliness-monitor.interval-ms yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs yarn.resourcemanager.container.liveness-monitor.interval-ms yarn.resourcemanager.nm.liveness-monitor.interval-ms yarn.timeline-service.hostname yarn.timeline-service.http-authentication.simple.anonymous.allowed yarn.timeline-service.http-authentication.type Presumably, the mapreduce.* properties are okay. Similarly, the yarn.timeline-service.* properties are for the future TimelineService. However, the rest are likely fully deprecated. Submitting bug for comment/feedback about which other properties should be kept in yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2561) MR job client cannot reconnect to AM after NM restart.
[ https://issues.apache.org/jira/browse/YARN-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2561: - Issue Type: Sub-task (was: Bug) Parent: YARN-666 MR job client cannot reconnect to AM after NM restart. -- Key: YARN-2561 URL: https://issues.apache.org/jira/browse/YARN-2561 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.6.0 Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2561-v2.patch, YARN-2561-v3.patch, YARN-2561-v4.patch, YARN-2561-v5.patch, YARN-2561.patch Work-preserving NM restart is disabled. Submit a job. Restart the only NM and found that Job will hang with connect retries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2289) ApplicationHistoryStore should be versioned
[ https://issues.apache.org/jira/browse/YARN-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned YARN-2289: Assignee: Junping Du ApplicationHistoryStore should be versioned --- Key: YARN-2289 URL: https://issues.apache.org/jira/browse/YARN-2289 Project: Hadoop YARN Issue Type: Sub-task Components: applications Reporter: Junping Du Assignee: Junping Du -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2464) Provide Hadoop as a local resource (on HDFS) which can be used by other projcets
[ https://issues.apache.org/jira/browse/YARN-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2464: - Issue Type: Sub-task (was: Improvement) Parent: YARN-666 Provide Hadoop as a local resource (on HDFS) which can be used by other projcets Key: YARN-2464 URL: https://issues.apache.org/jira/browse/YARN-2464 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Assignee: Junping Du DEFAULT_YARN_APPLICATION_CLASSPATH are used by YARN projects to setup their AM / task classpaths if they have a dependency on Hadoop libraries. It'll be useful to provide similar access to a Hadoop tarball (Hadoop libs, native libraries) etc, which could be used instead - for applications which do not want to rely upon Hadoop versions from a cluster node. This would also require functionality to update the classpath/env for the apps based on the structure of the tar. As an example, MR has support for a full tar (for rolling upgrades). Similarly, Tez ships hadoop libraries along with it's build. I'm not sure about the Spark / Storm / HBase model for this - but using a common copy instead of everyone localizing Hadoop libraries would be useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2565) RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore
[ https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141973#comment-14141973 ] Hudson commented on YARN-2565: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1877 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1877/]) YARN-2565. Fixed RM to not use FileSystemApplicationHistoryStore unless explicitly set. Contributed by Zhijie Shen (jianhe: rev 444acf8ea795e4bc782f1ce3b5ef7a1a47d1d27d) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore --- Key: YARN-2565 URL: https://issues.apache.org/jira/browse/YARN-2565 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Secure cluster with ATS (timeline server enabled) and yarn.resourcemanager.system-metrics-publisher.enabled=true so that RM can send Application history to Timeline Store Reporter: Karam Singh Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2565.1.patch, YARN-2565.2.patch, YARN-2565.3.patch Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141972#comment-14141972 ] Hudson commented on YARN-2460: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1877 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1877/]) YARN-2460. Remove obsolete entries from yarn-default.xml (Ray Chiang via aw) (aw: rev aa1052c34b78b5b8b6a1498c8c842d21b07fceca) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_1329348432655_0001_conf.xml * hadoop-tools/hadoop-sls/src/main/data/2jobs2min-rumen-jh.json Remove obsolete entries from yarn-default.xml - Key: YARN-2460 URL: https://issues.apache.org/jira/browse/YARN-2460 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: newbie Fix For: 2.6.0 Attachments: YARN-2460-01.patch, YARN-2460-02.patch The following properties are defined in yarn-default.xml, but do not exist in YarnConfiguration. mapreduce.job.hdfs-servers mapreduce.job.jar yarn.ipc.exception.factory.class yarn.ipc.serializer.type yarn.nodemanager.aux-services.mapreduce_shuffle.class yarn.nodemanager.hostname yarn.nodemanager.resourcemanager.connect.retry_interval.secs yarn.nodemanager.resourcemanager.connect.wait.secs yarn.resourcemanager.amliveliness-monitor.interval-ms yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs yarn.resourcemanager.container.liveness-monitor.interval-ms yarn.resourcemanager.nm.liveness-monitor.interval-ms yarn.timeline-service.hostname yarn.timeline-service.http-authentication.simple.anonymous.allowed yarn.timeline-service.http-authentication.type Presumably, the mapreduce.* properties are okay. Similarly, the yarn.timeline-service.* properties are for the future TimelineService. However, the rest are likely fully deprecated. Submitting bug for comment/feedback about which other properties should be kept in yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: YARN-2198.trunk.8.patch .trunk.8.patch is rebased to new repo current trunk and has the vcxproj/sln hunks manually fixed to CRLF Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.crlf.6.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141987#comment-14141987 ] Hadoop QA commented on YARN-2198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12670215/YARN-2198.trunk.8.patch against trunk revision f85cc14. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5059//console This message is automatically generated. Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch, YARN-2198.trunk.crlf.6.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142001#comment-14142001 ] Wangda Tan commented on YARN-2496: -- Craig, still about #2, I think what you commented is make sense to me, AM can get a more precise headroom to plan its following resource usage, but I think: 1) It may not enough as what you said: bq. For this reason, headroom should reflect the labels in the last resource request from the application, not the queue's labels. It is possible an AM sent resource requests with different label expression, so what we will response headroom back to AM? I think maybe we need a new field in AllocateRequest to request different headrooms under different label expression. 2) Even with 1), I cannot think of a good way to fast computing random label expression in an acceptable time complexity, it is possible thousands of different label expression existed in a big cluster at the same time. Our current implementation can make sure resource of labels of a queue will up-to-date whenever resource change happened. With 1) and 2). I suggest to make it as a pending task, and we can deal it in the future. About bq. -re 5, I though * could be in requests, if no, then should not be an issue. Yes, we doesn't support specify * in requests, because it may cause some possible resource wastage. AM should clearly know what resource it needed. Thanks, Wangda [YARN-796] Changes for capacity scheduler to support allocate resource respect labels - Key: YARN-2496 URL: https://issues.apache.org/jira/browse/YARN-2496 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch This JIRA Includes: - Add/parse labels option to {{capacity-scheduler.xml}} similar to other options of queue like capacity/maximum-capacity, etc. - Include a default-label-expression option in queue config, if an app doesn't specify label-expression, default-label-expression of queue will be used. - Check if labels can be accessed by the queue when submit an app with labels-expression to queue or update ResourceRequest with label-expression - Check labels on NM when trying to allocate ResourceRequest on the NM with label-expression - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: (was: YARN-2198.trunk.8.patch) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.crlf.6.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: (was: YARN-2198.trunk.crlf.6.patch) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: YARN-2198.trunk.8.patch Fix -Project^M Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2565) RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore
[ https://issues.apache.org/jira/browse/YARN-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142067#comment-14142067 ] Hudson commented on YARN-2565: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1902 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1902/]) YARN-2565. Fixed RM to not use FileSystemApplicationHistoryStore unless explicitly set. Contributed by Zhijie Shen (jianhe: rev 444acf8ea795e4bc782f1ce3b5ef7a1a47d1d27d) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/RMApplicationHistoryWriter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/ahs/TestRMApplicationHistoryWriter.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryServer.java RM shouldn't use the old RMApplicationHistoryWriter unless explicitly setting FileSystemApplicationHistoryStore --- Key: YARN-2565 URL: https://issues.apache.org/jira/browse/YARN-2565 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.6.0 Environment: Secure cluster with ATS (timeline server enabled) and yarn.resourcemanager.system-metrics-publisher.enabled=true so that RM can send Application history to Timeline Store Reporter: Karam Singh Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2565.1.patch, YARN-2565.2.patch, YARN-2565.3.patch Observed that RM fails to start in Secure mode when GenericeHistoryService is enabled and ResourceManager is set to use Timeline Store -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2460) Remove obsolete entries from yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142066#comment-14142066 ] Hudson commented on YARN-2460: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1902 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1902/]) YARN-2460. Remove obsolete entries from yarn-default.xml (Ray Chiang via aw) (aw: rev aa1052c34b78b5b8b6a1498c8c842d21b07fceca) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_1329348432655_0001_conf.xml * hadoop-yarn-project/CHANGES.txt * hadoop-tools/hadoop-sls/src/main/data/2jobs2min-rumen-jh.json Remove obsolete entries from yarn-default.xml - Key: YARN-2460 URL: https://issues.apache.org/jira/browse/YARN-2460 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: newbie Fix For: 2.6.0 Attachments: YARN-2460-01.patch, YARN-2460-02.patch The following properties are defined in yarn-default.xml, but do not exist in YarnConfiguration. mapreduce.job.hdfs-servers mapreduce.job.jar yarn.ipc.exception.factory.class yarn.ipc.serializer.type yarn.nodemanager.aux-services.mapreduce_shuffle.class yarn.nodemanager.hostname yarn.nodemanager.resourcemanager.connect.retry_interval.secs yarn.nodemanager.resourcemanager.connect.wait.secs yarn.resourcemanager.amliveliness-monitor.interval-ms yarn.resourcemanager.application-tokens.master-key-rolling-interval-secs yarn.resourcemanager.container.liveness-monitor.interval-ms yarn.resourcemanager.nm.liveness-monitor.interval-ms yarn.timeline-service.hostname yarn.timeline-service.http-authentication.simple.anonymous.allowed yarn.timeline-service.http-authentication.type Presumably, the mapreduce.* properties are okay. Similarly, the yarn.timeline-service.* properties are for the future TimelineService. However, the rest are likely fully deprecated. Submitting bug for comment/feedback about which other properties should be kept in yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142086#comment-14142086 ] Steve Loughran commented on YARN-913: - bq. I have some concern around 'naked' zookeeper.* config option This something that I do think needs changing in ZK; being driven by JVM properties can work for standalone JVM servers, but not for clients. The client here sets the properties just before needed (e.g. the SASL auth details), and I was thinking of making the set-connect operation class synchronized. But...curator does some session restarting and if those JVM-wide settings are changed, there may be problems. Summary: need to fix ZK client and then have curator configure it, so the rest of us don't have to care. bq. if a user kills the ZK used for app registry through some action, what happens to the RM and other user's bits that are running # The RM isn't depending on the ZK cluster for information; it just sets up the paths for a user, and does purges of container app lifespan parts on their completion. I've made both the setup and teardown operations async; the {{RMRegistryOperationsService}} class gets the RM event and schedules the work on its executor. If ZK is offline then these will block until the quorum is back, but it should not delay RM operations. It could block the clients and the AM starting up. # Curator supports different {{EnsembleProviders}} .. classes which provide the data needed for the client to reconnect to ZK. The code is currently only hooked up to one -the {{FixedEnsembleProvider}}, which uses a classic static ZK quorum. There's an alternative, the {{ExhibitorProvider}}, which hooks up to [Netflix Exhibitor|https://github.com/Netflix/exhibitor/wiki|] and can do things like [[Rolling Ensemble Change|https://github.com/Netflix/exhibitor/wiki/Rolling-Ensemble-Change]]. This is designed for cloud deployments where a ZK server failure results in a new host coming up, with new hostname/address ... exhibitor handles the details of rebinding. I haven't added explicit support for that (straightforward) or got a test setup (harder). If you want to play with it though ... bq. Why doesn't the hostname component allow for FQDNs? do you mean in the endpoint fields? It should ... let me clarify that in the example. bq. Are we prepared for more backlash when another component requires working DNS? The reason the initial patches here weren't building is a helper method to build up an endpoint address from an {{InetSocketAddress}} called {{getHostString()}} to get the host/FQDN, without doing any DNS work. I had to switch to {{getHostName()}}, which can try to do rDNS, and so rely on DNS working. bq. Is ZK the right thing to use here? # ZK gives us availability; I do plan to add a REST API later on, one that works long-haul. It's why there is deliberately no support for ephemeral nodes ... the {{RegistryOperations}} interface is designed to implementable by a REST client, for which there won't be any sessions to tie ephemeral nodes to. # By deliberately publishing nothing but endpoints to services, we're trying to keep the content in the store down, with the bulk data being served up by other means. In slider, we are publishing dynamically generated config files from the AM REST API; all the registry entry does is list the API + URL for that service. # I do like your idea about just sticking stuff into HDFS, S3, etc.; that's a way to share content too, including config data. It'll fit into the general category of URL formatted endpoint —maybe I should add it as an explicit address type, filesystem? Add a way to register long-lived services in a YARN cluster --- Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Affects Versions: 2.5.0, 2.4.1 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, YARN-913-007.patch, YARN-913-008.patch, yarnregistry.pdf, yarnregistry.tla In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142090#comment-14142090 ] Steve Loughran commented on YARN-913: - Oh, one more thing, that {{MicroZookeeperService}} which is used in tests? It's a YARN service-wrapped ZK microservice (based on Twill's test one), which can publish its ensemble information to registry clients running in-VM. This would make it straightforward to be deployed *inside* the RM ... in a small 1-2 node cluster it wouldn't be a load problem, and as the lifespan of the ZK == lifespan of RM, no worry about having a single ZK quorum outage impacting the RM. I've not put the service under the RM. Someone is free to at some point in the future. Add a way to register long-lived services in a YARN cluster --- Key: YARN-913 URL: https://issues.apache.org/jira/browse/YARN-913 Project: Hadoop YARN Issue Type: New Feature Components: api, resourcemanager Affects Versions: 2.5.0, 2.4.1 Reporter: Steve Loughran Assignee: Steve Loughran Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, YARN-913-007.patch, YARN-913-008.patch, yarnregistry.pdf, yarnregistry.tla In a YARN cluster you can't predict where services will come up -or on what ports. The services need to work those things out as they come up and then publish them somewhere. Applications need to be able to find the service instance they are to bond to -and not any others in the cluster. Some kind of service registry -in the RM, in ZK, could do this. If the RM held the write access to the ZK nodes, it would be more secure than having apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1492) truly shared cache for jars (jobjar/libjar)
[ https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1492: -- Priority: Critical (was: Major) Target Version/s: 2.6.0 Tx for the notes [~ctrezzo]! I am marking this as critical for 2.6, given how long it's been out in the open. Started reviewing the patches. truly shared cache for jars (jobjar/libjar) --- Key: YARN-1492 URL: https://issues.apache.org/jira/browse/YARN-1492 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.0.4-alpha Reporter: Sangjin Lee Assignee: Chris Trezzo Priority: Critical Attachments: YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, shared_cache_design.pdf, shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, shared_cache_design_v5.pdf, shared_cache_design_v6.pdf Currently there is the distributed cache that enables you to cache jars and files so that attempts from the same job can reuse them. However, sharing is limited with the distributed cache because it is normally on a per-job basis. On a large cluster, sometimes copying of jobjars and libjars becomes so prevalent that it consumes a large portion of the network bandwidth, not to speak of defeating the purpose of bringing compute to where data is. This is wasteful because in most cases code doesn't change much across many jobs. I'd like to propose and discuss feasibility of introducing a truly shared cache so that multiple jobs from multiple users can share and cache jars. This JIRA is to open the discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2168) SCM/Client/NM/Admin protocols
[ https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142139#comment-14142139 ] Vinod Kumar Vavilapalli commented on YARN-2168: --- Few comments on the APIs - Let's mark all the APIs as evolving, or may be even unstable. - The setters for responses and objects that are supposed to be only created by the server should be marked Private - we don't expect users to use them. For e.g. UseSharedCacheResourceResponse.setPath() - Let's move SCMAdminProtocol and all related records to org.apache.hadoop.yarn.server.api org.apache.hadoop.yarn.server.api.protocolrecords packages. - We are using checksum, key, resource-key to all refer to the same entity. Shall we standardize on resource-key? SCM/Client/NM/Admin protocols - Key: YARN-2168 URL: https://issues.apache.org/jira/browse/YARN-2168 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch This jira is meant to be used to review the main shared cache APIs. They are as follows: * ClientSCMProtocol - The protocol between the yarn client and the cache manager. This protocol controls how resources in the cache are claimed and released. ** UseSharedCacheResourceRequest ** UseSharedCacheResourceResponse ** ReleaseSharedCacheResourceRequest ** ReleaseSharedCacheResourceResponse * SCMAdminProtocol - This is an administrative protocol for the cache manager. It allows administrators to manually trigger cleaner runs. ** RunSharedCacheCleanerTaskRequest ** RunSharedCacheCleanerTaskResponse * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the cache manager. This allows the NodeManager to coordinate with the cache manager when uploading new resources to the shared cache. ** NotifySCMRequest ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2179) Initial cache manager structure and context
[ https://issues.apache.org/jira/browse/YARN-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142157#comment-14142157 ] Vinod Kumar Vavilapalli commented on YARN-2179: --- Some comments on the patch - Rename config yarn.sharedcache.root to root-path or root-dir? - I cannot see why sharedcachemanager depends on resourcemanager module, given I haven't seen the entire feature related code yet. Ideally -- sharedcachemanager simply uses yarn-client -- scharedcachemanager is its own module -- ResourceManager can embed shared-cache-manager by making it a run-time dependency (and thus not depend on it at compile time) - AppChecker.appIsActive() - isApplicationActive() and getAllActiveApps() - getActiveApplications()? (I tend to favor controlled verbosity :) ) - RemoteAppChecker won't work in when RM-failover is enabled. You are better off simply using YarnClient instead of building all of that functionality from scratch again. Similarly, for getAllActiveApps(), we can just use {{ListApplicationReport getApplications(EnumSetYarnApplicationState applicationStates)}} from YarnClient. bq. Would it make more sense to leverage getFinalApplicationStatus() instead of getYarnApplicationState()? That way we can just say if the FinalApplicationStatus is undefined don't clean it up, otherwise we are safe to delete the appId. FinalApplicationStatus is filled in by user-APIs and some applications may chose to leave it as UNDEFINED, so we cannot depend on it. I propose that we leave the usage of ApplicationState and add an API in YarnClient/RM to detect active states in a follow up. Initial cache manager structure and context --- Key: YARN-2179 URL: https://issues.apache.org/jira/browse/YARN-2179 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2179-trunk-v1.patch, YARN-2179-trunk-v2.patch, YARN-2179-trunk-v3.patch, YARN-2179-trunk-v4.patch, YARN-2179-trunk-v5.patch Implement the initial shared cache manager structure and context. The SCMContext will be used by a number of manager services (i.e. the backing store and the cleaner service). The AppChecker is used to gather the currently running applications on SCM startup (necessary for an scm that is backed by an in-memory store). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2180) In-memory backing store for cache manager
[ https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142167#comment-14142167 ] Vinod Kumar Vavilapalli commented on YARN-2180: --- The patch looks fine overall, some comments - yarn.sharedcache.manager.store.impl - yarn.sharedcache.store or store-class - YarnConfiguration.SCM_STORE_IMPL - SCM_STORE/SCM_STORE_CLASS - We already have Resource, LocalResource. To avoid confuse, shall we use SharedCacheResource and hence ResourceReference - SharedCacheResourceReference and so on everywhere? - InMemoryStore.bootstrap() can be done as part of serviceInit() - Synchronization is missing from the InMemoryStore operations? You are using a ConcurrentHashMap but to insert correctly (multiple apps adding a cache-entry to the same path) you'll need to use {{putIfAbsent}}? Surprised the test is presumably passing. In-memory backing store for cache manager - Key: YARN-2180 URL: https://issues.apache.org/jira/browse/YARN-2180 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, YARN-2180-trunk-v3.patch Implement an in-memory backing store for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142173#comment-14142173 ] Remus Rusanu commented on YARN-2198: Build error is [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c:1444:12: error: 'INVALID_HANDLE_VALUE' undeclared (first use in this function) [exec] return INVALID_HANDLE_VALUE; Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: YARN-2198.trunk.8.patch Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2198: --- Attachment: (was: YARN-2198.trunk.8.patch) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2252) Intermittent failure for testcase TestFairScheduler.testContinuousScheduling
[ https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142198#comment-14142198 ] Wei Yan commented on YARN-2252: --- +1 for the proposal, [~kasha]. Intermittent failure for testcase TestFairScheduler.testContinuousScheduling Key: YARN-2252 URL: https://issues.apache.org/jira/browse/YARN-2252 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: trunk-win Reporter: Ratandeep Ratti Labels: hadoop2, scheduler, yarn Attachments: YARN-2252-1.patch This test-case is failing sporadically on my machine. I think I have a plausible explanation for this. It seems that when the Scheduler is being asked for resources, the resource requests that are being constructed have no preference for the hosts (nodes). The two mock hosts constructed, both have a memory of 8192 mb. The containers(resources) being requested each require a memory of 1024mb, hence a single node can execute both the resource requests for the application. In the end of the test-case it is being asserted that the containers (resource requests) be executed on different nodes, but since we haven't specified any preferences for nodes when requesting the resources, the scheduler (at times) executes both the containers (requests) on the same node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142201#comment-14142201 ] Hadoop QA commented on YARN-2198: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12670247/YARN-2198.trunk.8.patch against trunk revision db890ee. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/5061//artifact/PreCommit-HADOOP-Build-patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 2 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5061//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5061//artifact/PreCommit-HADOOP-Build-patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5061//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5061//console This message is automatically generated. Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, YARN-2198.separation.patch, YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, YARN-2198.trunk.8.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC (Local Procedure Calls), which is a Windows platform specific inter-process communication channel that satisfies all requirements and is easy to deploy. The privileged NT service would register and listen on an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop with libwinutils which would host the LPC client code. The client would connect to the LPC port (NtConnectPort) and send a message requesting a container launch (NtRequestWaitReplyPort). LPC provides authentication and the privileged NT service can use authorization API (AuthZ) to validate the caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Maron updated YARN-2554: - Attachment: YARN-2554.3.patch Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Maron updated YARN-2554: - Attachment: YARN-2554.3.patch Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, YARN-2554.3.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142212#comment-14142212 ] Hadoop QA commented on YARN-2554: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12670250/YARN-2554.3.patch against trunk revision 84a0a62. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5062//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5062//console This message is automatically generated. Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, YARN-2554.3.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142216#comment-14142216 ] Hadoop QA commented on YARN-2554: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12670251/YARN-2554.3.patch against trunk revision 84a0a62. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5063//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5063//console This message is automatically generated. Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, YARN-2554.3.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2180) In-memory backing store for cache manager
[ https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142228#comment-14142228 ] Sangjin Lee commented on YARN-2180: --- bq. Synchronization is missing from the InMemoryStore operations? You are using a ConcurrentHashMap but to insert correctly (multiple apps adding a cache-entry to the same path) you'll need to use putIfAbsent? Surprised the test is presumably passing. I can answer this question as I wrote the in-memory store. :) Actually all operations (both access and mutation) on {{map}} are synchronized on the interned key. Since keys are unique, there can be no concurrent operations operating on the same key when synchronized on the key. Therefore, {{putIfAbsent()}} is not necessary (as it is needed if there are concurrent operations on the same key). There can be concurrent operations on the map on *different* keys, and that thread-safety is addressed by the {{ConcurrentHashMap}}. There are a couple of exceptions, and those are {{bootstrap()}} and {{clearCache()}}. The {{bootstrap()}} method is an exception because it operates on the map before it accepts any reads/writes. The {{clearCache()}} method is provided only for test purposes. Hope this helps. In-memory backing store for cache manager - Key: YARN-2180 URL: https://issues.apache.org/jira/browse/YARN-2180 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, YARN-2180-trunk-v3.patch Implement an in-memory backing store for the cache manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142253#comment-14142253 ] Vinod Kumar Vavilapalli commented on YARN-2554: --- Sorry, for jumping in late. You could fix the webapp proxy in theory. But the set up to make AM web UIs accept Https is impractical. AMs can launch on any machine in a cluster. They can be run by different users. Enabling SSL through distribution of keys per application, per user across the cluster is not a great solution. This is the reason why chose to not fix it and thus not enable the same for MapReduce. The better solution is either - to keep the status quo (AM webUIs don't enable SSL) or - to get rid of AM UIs altogether and move to a client-side UI on top of Timeline server (YARN-1530) - it has its own limitations though. Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, YARN-2554.3.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142256#comment-14142256 ] Jonathan Maron commented on YARN-2554: -- I'm not certain I understand your comment about the keys. The client trust store configured via ssl-client.xml generally contains the certificates for the cluster hosts, it is not specific to users or applications. In any usage scenario it would by necessity contain those certificates. Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, YARN-2554.3.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne updated YARN-2056: - Attachment: YARN-2056.201409210049.txt Hi [~leftnoteasy]. Thank you for spending the time to look at this patch and provide helpful suggestions. {quote} IMHO, the right place to put reserving resource logic for un-preemptable queue is not {{resetCapacity}}, it should in {{computeFixpointAllocation}}. ... Does this make sense to you? {quote} Yes, that makes sense, and I think it is a simpler algorithm. I updated the patch, so please have a look. I have made a conscious decision to only allow disable preemption at the leaf queue level. This is because there may be a use case where you want to disable preemption at the parent level, and have other queue hierarchies leave it alone, but then allow preemption between children of the disabled parent. So, rather than solve that problem with this fix, I only allow leaf queues to disable preemption. Even if a leaf queue could inherit it's parent's disable preemption value, there will likely be cases where part of the parent queue's over-capacity resources are untouchable and part of them are preemptable. So, I adjusted your suggested algorithm somewhat. - I collected untouchableExtra instead of preemptableExtra at the TempQueue level. in {{computeFixpointAllocation}}, - I looped through each queue, and if one has any untouchableExtra, then the queue's {{idealAssigned = guaranteed + untouchableExtra}} - In {{TempQueue#offer}}, one of the calculations is {{current + pending - idealAssigned}}. I had to take into consideration that if the queue has over capacity, some of it may be untouchable and some may be preemptable. If some of it is preemptable, then {{current}} could be greater than {{idealAssigned}}, and {{TempQueue#offer}} would end up assigning more to that queue than it should. Disable preemption at Queue level - Key: YARN-2056 URL: https://issues.apache.org/jira/browse/YARN-2056 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal Assignee: Eric Payne Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, YARN-2056.201409181916.txt, YARN-2056.201409210049.txt We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2554) Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy
[ https://issues.apache.org/jira/browse/YARN-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142268#comment-14142268 ] Vinod Kumar Vavilapalli commented on YARN-2554: --- I am talking about the server-side i.e. AMs. To use ssl to AM webapps, - the key-store needs to present on all machine to distribute certificates: AMs may come up anywhere. - the key-store used by Hadoop daemons *CANNOT* be shared with AMs: AMs run user-code as the user - the key-store cannot be shared across AMs of different users: Assuming I am running three different Slider apps as three different users, you don't want to have a single key-store instance accessible by all Slider AMs. - And distributing/installing/managing it per user is complex. Slider AM Web UI is inaccessible if HTTPS/SSL is specified as the HTTP policy - Key: YARN-2554 URL: https://issues.apache.org/jira/browse/YARN-2554 Project: Hadoop YARN Issue Type: Bug Components: webapp Affects Versions: 2.6.0 Reporter: Jonathan Maron Attachments: YARN-2554.1.patch, YARN-2554.2.patch, YARN-2554.3.patch, YARN-2554.3.patch If the HTTP policy to enable HTTPS is specified, the RM and AM are initialized with SSL listeners. The RM has a web app proxy servlet that acts as a proxy for incoming AM requests. In order to forward the requests to the AM the proxy servlet makes use of HttpClient. However, the HttpClient utilized is not initialized correctly with the necessary certs to allow for successful one way SSL invocations to the other nodes in the cluster (it is not configured to access/load the client truststore specified in ssl-client.xml). I imagine SSLFactory.createSSLSocketFactory() could be utilized to create an instance that can be assigned to the HttpClient. The symptoms of this issue are: AM: Displays unknown_certificate exception RM: Displays an exception such as javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142290#comment-14142290 ] Hadoop QA commented on YARN-2056: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12670263/YARN-2056.201409210049.txt against trunk revision 84a0a62. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5064//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5064//console This message is automatically generated. Disable preemption at Queue level - Key: YARN-2056 URL: https://issues.apache.org/jira/browse/YARN-2056 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Mayank Bansal Assignee: Eric Payne Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, YARN-2056.201409181916.txt, YARN-2056.201409210049.txt We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2577) Clarify ACL delimiter and how to configure ACL groups only
Miklos Christine created YARN-2577: -- Summary: Clarify ACL delimiter and how to configure ACL groups only Key: YARN-2577 URL: https://issues.apache.org/jira/browse/YARN-2577 Project: Hadoop YARN Issue Type: Improvement Components: documentation, fairscheduler Affects Versions: 2.5.1 Reporter: Miklos Christine Priority: Trivial Reading through the Fair Scheduler documentation, it would be great to explicitly state that the delimiter for the fair scheduler ACLs is the space character. If specifying only ACL groups, users should begin the value with the space character. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2577) Clarify ACL delimiter and how to configure ACL groups only
[ https://issues.apache.org/jira/browse/YARN-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miklos Christine updated YARN-2577: --- Attachment: YARN-2577.patch Clarify ACL delimiter and how to configure ACL groups only -- Key: YARN-2577 URL: https://issues.apache.org/jira/browse/YARN-2577 Project: Hadoop YARN Issue Type: Improvement Components: documentation, fairscheduler Affects Versions: 2.5.1 Reporter: Miklos Christine Priority: Trivial Labels: newbie Attachments: YARN-2577.patch Reading through the Fair Scheduler documentation, it would be great to explicitly state that the delimiter for the fair scheduler ACLs is the space character. If specifying only ACL groups, users should begin the value with the space character. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2577) Clarify ACL delimiter and how to configure ACL groups only
[ https://issues.apache.org/jira/browse/YARN-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142311#comment-14142311 ] Hadoop QA commented on YARN-2577: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12670266/YARN-2577.patch against trunk revision 84a0a62. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5065//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5065//console This message is automatically generated. Clarify ACL delimiter and how to configure ACL groups only -- Key: YARN-2577 URL: https://issues.apache.org/jira/browse/YARN-2577 Project: Hadoop YARN Issue Type: Improvement Components: documentation, fairscheduler Affects Versions: 2.5.1 Reporter: Miklos Christine Priority: Trivial Labels: newbie Attachments: YARN-2577.patch Reading through the Fair Scheduler documentation, it would be great to explicitly state that the delimiter for the fair scheduler ACLs is the space character. If specifying only ACL groups, users should begin the value with the space character. -- This message was sent by Atlassian JIRA (v6.3.4#6332)