[jira] [Commented] (YARN-9834) Allow using a pool of local users to run Yarn Secure Container in secure mode

shanyu zhao (Jira) Tue, 17 Sep 2019 18:27:46 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931953#comment-16931953
 ]


shanyu zhao commented on YARN-9834:
-----------------------------------

{quote}I think host joining AD is required to keep authorized servers to access 
LDAP. Without AD authenticate the host, it would be wide open, no?{quote}
"host" in this scenario is the Docker container running node manager. That's 
why I said it doesn't make sense for that container to domain join just to sync 
domain user to local Docker container. With this PR there is no need to sync 
domain users to "host", we precreate pool of users and dynamically assign to 
Yarn container process to run.

{quote}I am still not clear on how the pool users would be used. It sounds like 
user1 can be given to Joe, and Fred at different time. Wouldn't that give Fred 
access to hdfs data stored by Joe as user1?{quote}
Yes local user "user1" can be given to Joe and Fred at different time. But 
before we reassign "user1" from Joe to Fred, we make sure all local files as 
the result or running the Yarn container (data and logs) are deleted. Note that 
to access HDFS from the Yarn container process, it always need to use the 
Hadoop delegation token, which is just a APPLICATION type of resource localized 
to the working directory of that container. When the application is finished, 
this file is deleted so when Freq's container is running it won't be able to 
talk to HDFS with Joe's Hadoop delegation token.

{quote}How about the same user? Let's consider:
1. Joe run as user1, debug delay enabled, running: 
application_1568407112772_0007
2. Fred run as user1, running: application_1568407112772_0008{quote}
In this case, Fred won't get allocation of "user1" because there are pending 
FileDeletionTask, the reference count for FileOpCount of LocalUserInfo is not 
zero therefore Joe still holds user1, until all the pending FileDeletionTask is 
executed and the reference count reaches zero. Then "user1" is up for reuse.


> Allow using a pool of local users to run Yarn Secure Container in secure mode
> -----------------------------------------------------------------------------
>
>                 Key: YARN-9834
>                 URL: https://issues.apache.org/jira/browse/YARN-9834
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.1.2
>            Reporter: shanyu zhao
>            Assignee: shanyu zhao
>            Priority: Major
>
> Yarn Secure Container in secure mode allows separation of different user's 
> local files and container processes running on the same node manager. This 
> depends on an out of band service such as SSSD/Winbind to sync all domain 
> users to local machine.
> Winbind user sync has lots of overhead, especially for large corporations. 
> Also if running Yarn inside Kubernetes cluster (meaning node managers running 
> inside Docker container), it doesn't make sense for each container to domain 
> join with Active Directory and sync a whole copy of domain users.
> We should allow a new configuration to Yarn, such that we can pre-create a 
> pool of users on each machine/Docker container. And at runtime, Yarn 
> allocates a local user to the domain user that submits the application. When 
> all containers of that user are finished and all files belonging to that user 
> are deleted, we can release the allocation and allow other users to use the 
> same local user to run their Yarn containers.
> h2. Design
> We propose to add these new configurations:
> {code:java}
> yarn.nodemanager.linux-container-executor.secure-mode.use-local-user, 
> defaults to false
> yarn.nodemanager.linux-container-executor.secure-mode.local-user-prefix, 
> defaults to "user"{code}
> By default this feature is turned off. If we enable it, with 
> local-user-prefix set to "user", then we expect there are pre-created local 
> users user0 - usern, where the total number of local users equals to:
> {code:java}
> yarn.nodemanager.resource.cpu-vcores {code}
> We can use an in-memory allocator to keep the domain user to local user 
> mapping. 
> Now when to add the mapping and when to remove it?
> In node manager, ApplicationImpl implements the state machine for a Yarn app 
> life cycle, only if the app has at least 1 container running on that node 
> manager. We can hook up the code to add the mapping during application 
> initialization.
> For removing the mapping, we need to wait for 3 things:
> 1) All applications of the same user is completed;
>  2) All log handling of the applications (log aggregation or non-aggregated 
> handling) is done;
>  3) All pending FileDeletionTask that use the user's identity is finished.
> Note that all operation to these reference counting should be synchronized 
> operation.
> If all of our local users in the pool are allocated, we'll return 
> "nonexistuser" as runas user, this will cause the container to fail to 
> execute and Yarn will relaunch it in other nodes.
> h2. Limitations
> 1) This feature does not support PRIVATE visibility type of resource 
> allocation. Because PRIVATE type of resources are potentially cached in the 
> node manager for a very long time, supporting it will be a security problem 
> that a user might be able to peek into previous user's PRIVATE resources. We 
> can modify code to treat all PRIVATE type of resource as APPLICATION type.
> 2) It is recommended to enable DominantResourceCalculator so that no more 
> than "cpu-vcores" number of concurrent containers running on a node manager:
> {code:java}
> yarn.scheduler.capacity.resource-calculator
> = org.apache.hadoop.yarn.util.resource.DominantResourceCalculator {code}
> 3) Currently this feature does not work with Yarn Node Manager recovery. This 
> is because the mappings are kept in memory, it cannot be recovered after node 
> manager restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9834) Allow using a pool of local users to run Yarn Secure Container in secure mode

Reply via email to