[ https://issues.apache.org/jira/browse/YARN-8569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596014#comment-16596014 ]
Wangda Tan commented on YARN-8569: ---------------------------------- [~eyang], {quote}Unless malicious user already hacked into yarn user account and populate data as yarn user, there is no easy parameter hacking to container-executor to trigger exploits. {quote} There were tons of debates regarding to yarn user should be treated as root or not before. We saw some issues of c-e causes yarn user can manipulate other user's directories, or directly escalate to root user. All of these issues become CVE. {quote}This is the reason that this solution is invented to lower the bar of writing clustering software for Hadoop. {quote} It gonna be help if you can share some real-world examples. >From YARN's design purpose, ideally all NM/RM logics should be as general as >possible, all service-related stuffs should be handled by service framework >like API server or ServiceMaster. I really don't like the idea of adding >service-specific API to NM API. If you do think update service spec json file is important, another approach could be: 1) ServiceMaster ro mount a local directory (under the container's local dir) when launch docker container (example like: ./service-info -> /service/sys/fs/) 2) ServiceMaster request to re-localize new service spec json file to the ./service-info folder. > Create an interface to provide cluster information to application > ----------------------------------------------------------------- > > Key: YARN-8569 > URL: https://issues.apache.org/jira/browse/YARN-8569 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Eric Yang > Assignee: Eric Yang > Priority: Major > Labels: Docker > Attachments: YARN-8569.001.patch, YARN-8569.002.patch > > > Some program requires container hostnames to be known for application to run. > For example, distributed tensorflow requires launch_command that looks like: > {code} > # On ps0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=ps --task_index=0 > # On ps1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=ps --task_index=1 > # On worker0.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=worker --task_index=0 > # On worker1.example.com: > $ python trainer.py \ > --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 \ > --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 \ > --job_name=worker --task_index=1 > {code} > This is a bit cumbersome to orchestrate via Distributed Shell, or YARN > services launch_command. In addition, the dynamic parameters do not work > with YARN flex command. This is the classic pain point for application > developer attempt to automate system environment settings as parameter to end > user application. > It would be great if YARN Docker integration can provide a simple option to > expose hostnames of the yarn service via a mounted file. The file content > gets updated when flex command is performed. This allows application > developer to consume system environment settings via a standard interface. > It is like /proc/devices for Linux, but for Hadoop. This may involve > updating a file in distributed cache, and allow mounting of the file via > container-executor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org