[jira] [Commented] (YARN-9385) YARN Services with simple authentication doesn't respect current UGI
[ https://issues.apache.org/jira/browse/YARN-9385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793902#comment-16793902 ] Todd Lipcon commented on YARN-9385: --- +1, lgtm, thanks Eric > YARN Services with simple authentication doesn't respect current UGI > > > Key: YARN-9385 > URL: https://issues.apache.org/jira/browse/YARN-9385 > Project: Hadoop YARN > Issue Type: Improvement > Components: security, yarn-native-services >Reporter: Todd Lipcon >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9385.001.patch, YARN-9385.002.patch, > YARN-9385.003.patch, YARN-9385.004.patch, YARN-9385.005.patch > > > The ApiServiceClient implementation appends the current username to the > request URL for "simple" authentication. However, that username is derived > from the 'user.name' system property instead of the current UGI. That means > that username spoofing via the 'HADOOP_USER_NAME' variable doesn't take > effect for HTTP-based calls in the same manner that it does for RPC-based > calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9385) YARN Services with simple authentication doesn't respect current UGI
Todd Lipcon created YARN-9385: - Summary: YARN Services with simple authentication doesn't respect current UGI Key: YARN-9385 URL: https://issues.apache.org/jira/browse/YARN-9385 Project: Hadoop YARN Issue Type: Improvement Components: security, yarn-native-services Reporter: Todd Lipcon The ApiServiceClient implementation appends the current username to the request URL for "simple" authentication. However, that username is derived from the 'user.name' system property instead of the current UGI. That means that username spoofing via the 'HADOOP_USER_NAME' variable doesn't take effect for HTTP-based calls in the same manner that it does for RPC-based calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9385) YARN Services with simple authentication doesn't respect current UGI
[ https://issues.apache.org/jira/browse/YARN-9385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791981#comment-16791981 ] Todd Lipcon commented on YARN-9385: --- I noticed that the 'user.name' request parameter setting is done in two different ways in this file. In the getRMWebAddress() function it's correctly using UserGroupInformation to get the username, whereas in 'appendUserNameIfRequired()' it's using the Java system property. It seems replacing the use of the system property with the UGI-based short name in appendUserNameIfRequired() is the way to fix this. However, I noticed one other inconsistency: appendUserNameIfRequired() is basing its decision whether to append the username on the configuration of the http authentication configuration variable, whereas getRMWebAddress is basing it on whether Kerberos is enabled. Which of the two is correct? It seems like probably the former (the HTTP-specific setting) is more appropriate. > YARN Services with simple authentication doesn't respect current UGI > > > Key: YARN-9385 > URL: https://issues.apache.org/jira/browse/YARN-9385 > Project: Hadoop YARN > Issue Type: Improvement > Components: security, yarn-native-services >Reporter: Todd Lipcon >Priority: Major > > The ApiServiceClient implementation appends the current username to the > request URL for "simple" authentication. However, that username is derived > from the 'user.name' system property instead of the current UGI. That means > that username spoofing via the 'HADOOP_USER_NAME' variable doesn't take > effect for HTTP-based calls in the same manner that it does for RPC-based > calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-2490) Bad links to jobhistory server
Todd Lipcon created YARN-2490: - Summary: Bad links to jobhistory server Key: YARN-2490 URL: https://issues.apache.org/jira/browse/YARN-2490 Project: Hadoop YARN Issue Type: Bug Reporter: Todd Lipcon If you run an MR/YARN cluster without configuring the jobhistory URL, you get some really bad usability: - your jobs still produce JobHistory links - the job history link goes to whichever NM the AM happened to run on Even if you run the job history server on the same server as the RM, your links will be incorrect unless you've explicitly configured its hostname. If JobHistory isn't running, we shouldn't produce URLs (or we should by default embed JobHistory inside the RM). If we require a hostname beyond 0.0.0.0, we should refuse to start the JH server with 0.0.0.0 as its configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2491) Speculative attempts should not run on the same node as their original attempt
Todd Lipcon created YARN-2491: - Summary: Speculative attempts should not run on the same node as their original attempt Key: YARN-2491 URL: https://issues.apache.org/jira/browse/YARN-2491 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 3.0.0 Reporter: Todd Lipcon I'm seeing a behavior on trunk with fair scheduler enabled where a speculative reduce attempt is getting run on the same node as its original attempt. This doesn't make sense -- the main reason for speculative execution is to deal with a slow node, so scheduling a second attempt on the same node would just make the problem worse if anything. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions
[ https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938828#comment-13938828 ] Todd Lipcon commented on YARN-1796: --- Patch looks good to me. +1 pending Jenkins. container-executor shouldn't require o-r permissions Key: YARN-1796 URL: https://issues.apache.org/jira/browse/YARN-1796 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Priority: Minor Attachments: YARN-1796.patch The container-executor currently checks that other users don't have read permissions. This is unnecessary and runs contrary to the debian packaging policy manual. This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1795) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens
[ https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935937#comment-13935937 ] Todd Lipcon commented on YARN-1795: --- I'm seeing this on a real cluster, too, without running Oozie. Out of a job with 1000 tasks I typically see a few tasks early in the job's lifetime (first wave of task assignment) fail, all on the same host. EG: {code} 14/03/14 19:15:38 INFO mapreduce.Job: map 0% reduce 0% 14/03/14 19:15:42 INFO mapreduce.Job: Task Id : attempt_1394818402366_5229_m_66_0, Status : FAILED Container launch failed for container_1394818402366_5229_01_74 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for d2208.halxg.cloudera.com:8041 at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 14/03/14 19:15:42 INFO mapreduce.Job: Task Id : attempt_1394818402366_5229_m_000107_0, Status : FAILED Container launch failed for container_1394818402366_5229_01_000118 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for d2208.halxg.cloudera.com:8041 at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) 14/03/14 19:15:51 INFO mapreduce.Job: Task Id : attempt_1394818402366_5229_m_66_1, Status : FAILED Container launch failed for container_1394818402366_5229_01_000135 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for d2208.halxg.cloudera.com:8041 at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196) at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens Key: YARN-1795 URL: https://issues.apache.org/jira/browse/YARN-1795 Project: Hadoop YARN Issue Type: Bug
[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM
[ https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848168#comment-13848168 ] Todd Lipcon commented on YARN-1029: --- I agree with Karthik here -- the main reasons to pursue a separate ZKFC in HDFS were: - avoid failover in the case of GC (since ZKFC has a very low heap requirement) but still failover fast in machine failure. - avoid adding any dependency on ZK within the NN - allow the option to use other resource managers -- in practice no one has done this and I think the extra complexity all of our pluggability introduces is not worth it In the case of RM HA, as I understand it (apologies if I got anything wrong - only tangentially followed this discussion): - RM HA uses ZK itself for shared storage, so it already has a dependency on ZK. - Given that the shared state is in ZK, we don't need fencing if the same ZK client does election. The reason is that, if an RM loses its ZK lease, it will simultaneously trigger the failover _and_ be unable to make further changes in ZK. This exactly the semantics that we want. Having a separate ZKFC actually complicates things, because we may have to reintroduce some kind of fencing. What does it mean if the ZKFC loses its ZK lease, but the RM itself continues to have access to ZK? It multiplies the 'state diagram' in two, and doesn't seem to offer any particular advantages. As for embedding ZKFC (and refactoring it so it can (a) not do health checks, (b) not control the RM via RPC, but directly, (c) re-use the same ZK session) seems more complicated than it's worth. Given we'd be throwing away all of the ZKFC features beyond the elector, why not just use the elector? I'm also not sure why we want to preserve the external ZKFC option - per above it's a more complicated deployment scenario and seems to offer little tangible benefit. Allow embedding leader election into the RM --- Key: YARN-1029 URL: https://issues.apache.org/jira/browse/YARN-1029 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha Assignee: Karthik Kambatla Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, yarn-1029-0.patch, yarn-1029-approach.patch It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to use cgroups in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783236#comment-13783236 ] Todd Lipcon commented on YARN-1253: --- On the security front, I see this as an improvement in compartmentalization. Sometimes people have an HDFS cluster with lax security concerns for the data within that cluster -- or already restrict access to submit jobs on the cluster to folks who are considered trusted to access all of the data. Given that, the concern of a malicious user masquerading as another on the cluster isn't a big one -- or else they'd set up Kerberos security as you mentioned above. That said, these clusters may still be configured with automatic NFS mounts, and non-Kerberized NFS, in which case the ability to masquerade as another Unix user is a big problem. Let me give an example from my past -- a univerisity CS department where I helped as a system administrator. In this environment, all users used non-Kerberized nfs v3 to access a big filer with home directories. Users on the sysadmin staff had a great deal of access on the filer, as well as general sudo type access, sensitive SSH keys, etc stored within their home directories. Obviously, students did not. In this same environment, we had shared compute clusters (typically traditional HPCC, but some early experiments with Hadoop as well back in ancient days). Different grad students shared these compute clusters to perform their research jobs on, but security would not have been an important consideration - within a trusted environment like a small university research department, convenience outweighed security. That said, resource allocation and isolation between users was important - I had many cases I had to handle where students or professors up against a paper deadline got pretty pissed off that some undergrad was monopolizing CPU cycles on shared machines. In such an environment, the setup proposed by this JIRA would help. We could not have simply used LCE, because that would have opened an attack vector: any student could submit a job as toddlipcon and then use my NFS access to essentially gain department-wide root. Without using LCE, there would be poor resource isolation (lacking cgroups). I'm certain that there are other environments as well where Unix user masquerading opens a lot of attack vectors, but where within-Hadoop strong auth is not a requirement. Changes to LinuxContainerExecutor to use cgroups in unsecure mode - Key: YARN-1253 URL: https://issues.apache.org/jira/browse/YARN-1253 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik Priority: Blocker When using cgroups we require LCE to be configured in the cluster to start containers. When LCE starts containers as the user that submitted the job. While this works correctly in a secure setup, in an un-secure setup this presents a couple issues: * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes * Because users can impersonate other users, any user would have access to any local file of other users Particularly, the second issue is not desirable as a user could get access to ssh keys of other users in the nodes or if there are NFS mounts, get to other users data outside of the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode
[ https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783318#comment-13783318 ] Todd Lipcon commented on YARN-1253: --- bq. We should refactor that code out to be able to use it as a standalone library/binary (which doesn't bring in the extra baggage of user-accounts etc.) - that's the correct fix IMO. Putting in a local-user is an easy short-term solution I think separating the local run-as user from the daemon user has other benefits as well, separate from cgroups. This is a long-standing tradition in Unix services - eg Apache httpd typically runs CGI scripts as nobody unless suexec is configured. So this change still has value. Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode - Key: YARN-1253 URL: https://issues.apache.org/jira/browse/YARN-1253 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Roman Shaposhnik Priority: Blocker Attachments: YARN-1253.patch.txt When using cgroups we require LCE to be configured in the cluster to start containers. When LCE starts containers as the user that submitted the job. While this works correctly in a secure setup, in an un-secure setup this presents a couple issues: * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes * Because users can impersonate other users, any user would have access to any local file of other users Particularly, the second issue is not desirable as a user could get access to ssh keys of other users in the nodes or if there are NFS mounts, get to other users data outside of the cluster. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-311) Dynamic node resource configuration on RM with JMX interface
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13544082#comment-13544082 ] Todd Lipcon commented on YARN-311: -- Per the discussion in HADOOP-9160, I really don't think we should add anything which is only available over JMX. The security model, for one, is wildly incompatible with the rest of Hadoop security. If the main reason for wanting JMX is so that other software can call these RPCs without the Hadoop jar, I'll counter and say we should go even farther and allow other software to not even require a JVM. I see two ways of doing this: 1) Implement a simple client in C or Python which speaks Hadoop RPC via protobuf 2) Add REST interfaces. I am in favor of doing option 1. A single-threaded blocking RPC client without any connection pooling, etc, is not very difficult to write, and for administrative purposes that would be sufficient, right? If we had such a thing available as a relatively small C or python library available, would that solve the issue just as well? Dynamic node resource configuration on RM with JMX interface Key: YARN-311 URL: https://issues.apache.org/jira/browse/YARN-311 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, scheduler Reporter: Junping Du Assignee: Junping Du As the first step, we go for resource change on RM side and expose JMX API. For design details, please refer proposal and discussion in parent JIRA: YARN-291. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira