[ https://issues.apache.org/jira/browse/YARN-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932041#comment-16932041 ]
Eric Yang commented on YARN-9834: --------------------------------- Sorry, this design looks too risky for me to consider. I will let others provide feedback. > Allow using a pool of local users to run Yarn Secure Container in secure mode > ----------------------------------------------------------------------------- > > Key: YARN-9834 > URL: https://issues.apache.org/jira/browse/YARN-9834 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.1.2 > Reporter: shanyu zhao > Assignee: shanyu zhao > Priority: Major > > Yarn Secure Container in secure mode allows separation of different user's > local files and container processes running on the same node manager. This > depends on an out of band service such as SSSD/Winbind to sync all domain > users to local machine. > Winbind user sync has lots of overhead, especially for large corporations. > Also if running Yarn inside Kubernetes cluster (meaning node managers running > inside Docker container), it doesn't make sense for each container to domain > join with Active Directory and sync a whole copy of domain users. > We should allow a new configuration to Yarn, such that we can pre-create a > pool of users on each machine/Docker container. And at runtime, Yarn > allocates a local user to the domain user that submits the application. When > all containers of that user are finished and all files belonging to that user > are deleted, we can release the allocation and allow other users to use the > same local user to run their Yarn containers. > h2. Design > We propose to add these new configurations: > {code:java} > yarn.nodemanager.linux-container-executor.secure-mode.use-local-user, > defaults to false > yarn.nodemanager.linux-container-executor.secure-mode.local-user-prefix, > defaults to "user"{code} > By default this feature is turned off. If we enable it, with > local-user-prefix set to "user", then we expect there are pre-created local > users user0 - usern, where the total number of local users equals to: > {code:java} > yarn.nodemanager.resource.cpu-vcores {code} > We can use an in-memory allocator to keep the domain user to local user > mapping. > Now when to add the mapping and when to remove it? > In node manager, ApplicationImpl implements the state machine for a Yarn app > life cycle, only if the app has at least 1 container running on that node > manager. We can hook up the code to add the mapping during application > initialization. > For removing the mapping, we need to wait for 3 things: > 1) All applications of the same user is completed; > 2) All log handling of the applications (log aggregation or non-aggregated > handling) is done; > 3) All pending FileDeletionTask that use the user's identity is finished. > Note that all operation to these reference counting should be synchronized > operation. > If all of our local users in the pool are allocated, we'll return > "nonexistuser" as runas user, this will cause the container to fail to > execute and Yarn will relaunch it in other nodes. > h2. Limitations > 1) This feature does not support PRIVATE visibility type of resource > allocation. Because PRIVATE type of resources are potentially cached in the > node manager for a very long time, supporting it will be a security problem > that a user might be able to peek into previous user's PRIVATE resources. We > can modify code to treat all PRIVATE type of resource as APPLICATION type. > 2) It is recommended to enable DominantResourceCalculator so that no more > than "cpu-vcores" number of concurrent containers running on a node manager: > {code:java} > yarn.scheduler.capacity.resource-calculator > = org.apache.hadoop.yarn.util.resource.DominantResourceCalculator {code} > 3) Currently this feature does not work with Yarn Node Manager recovery. This > is because the mappings are kept in memory, it cannot be recovered after node > manager restart. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org