[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511814#comment-16511814 ] Hudson commented on YARN-8259: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14424 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14424/]) YARN-8259. Improve privileged docker container liveliness checks. (eyang: rev 22994889dc449f966fb6462a3ac3d3bbaee3ac6a) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/LinuxContainerRuntimeConstants.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8259.001.patch, YARN-8259.002.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511444#comment-16511444 ] genericqa commented on YARN-8259: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 58s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 2m 29s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 37s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 78m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8259 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927676/YARN-8259.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4581982638e6 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510290#comment-16510290 ] Shane Kumpf commented on YARN-8259: --- Thanks for the input everyone. {quote}Could you add some information to DockerContainers.md {quote} Absolutely, I'll get a new patch up shortly with the doc improvements, thanks again for all the feedback [~eyang] > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510275#comment-16510275 ] Eric Yang commented on YARN-8259: - 4 People have expressed opinion to go with option #1. Therefore, this patch should be ready for commit in it's current form. [~shaneku...@gmail.com] Could you add some information to DockerContainers.md, Privileged Container Security Consideration section to indicate to white list NM user if hidepid option is enabled? > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508657#comment-16508657 ] Eric Badger commented on YARN-8259: --- I would give a slight preference to proposal #1 because of performance, especially in the live-restore case. There is a workaround even with hidepid in place, so I think the good outweighs the bad. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508516#comment-16508516 ] Eric Yang commented on YARN-8259: - I prefer #3 to keep abstraction in place, and improve portability. #1 with documentation is my second choice to address this problem. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508198#comment-16508198 ] Jim Brennan commented on YARN-8259: --- I think we should go with Option 1 with documentation to whitelist the NM user if hidepid is enabled. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507437#comment-16507437 ] Shane Kumpf commented on YARN-8259: --- [~eyang], [~Jim_Brennan], [~ebadger], [~jlowe] - any additional feedback? I'd like to get this in soon given the impact. It appears the patch that implements #1 still applies if we wanted to go with that for now. We could add alternatives later based on user demand. I want everyone to be comfortable with the approach though. Thanks! > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497266#comment-16497266 ] Shane Kumpf commented on YARN-8259: --- Thanks for the feedback, [~ebadger]. {quote}if the yarn user is whitelisted for hidepid, then isn't that going to get you basically the same situation as checking pids as a privileged user? {quote} Perhaps non-starter was a bit harsh. I do see what you mean but I think they are a bit different. To clarify, if the admin has explicitly enabled hidepid, allowing yarn to bypass that protection via c-e would be surprising behavior, IMO. If hidepid is disabled or the yarn user is explicitly whitelisted, then the admin should not be surprised that the yarn user can see all pids. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497239#comment-16497239 ] Eric Badger commented on YARN-8259: --- For proposal #1, if the yarn user is whitelisted for hidepid, then isn't that going to get you basically the same situation as checking pids as a privileged user? I.e. you'll be able to see all arbitrary pids if you are able to compromise the yarn user. If that's a non-starter, then we have no choice but to go with proposal #4 (even though I would prefer #1). > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496492#comment-16496492 ] Shane Kumpf commented on YARN-8259: --- I've been doing additional testing here and could use input from the community as all of the solutions have cons. Here is what I've tested and been considering. 1) */proc/pid check as yarn* Pros: * No c-e changes * Works for with Docker live restore Cons: * Breaks down when using hide pid * Portability 2) */proc/pid or kill -0 as privileged user* Pros: * Works for with Docker live restore Cons: * Circumvents hidepid, allows the yarn user to check the existence of any pid due to use of elevated privileges. * Portability (/proc method) 3) *docker inspect* Pros: * No c-e changes * Uses the Docker API Cons: * Requires retry handling to support Docker live restore. ** In the case of a Docker daemon upgrade, this means the upgrade must complete before the retries are exhausted, which could mean hundreds of retries. 4) *Hybrid* (Keep existing kill -0 for non-privileged, docker inspect for privileged) Pros: * No c-e changes * Limits impacts to live restore Cons: * Requires retry handling to support Docker live restore. * Different handling based on container type. I believe #2 is a non-starter as it silently bypasses the hidepid option. I'm leaning towards striking #3 from the list as well, as we really need the recovery logic to be solid, so I don't want to unnecessary impact non-privileged containers which appear to be working well. At this point, I'm leaning towards #4 or #1 (with docs indicating that the NM user must be whitelisted if hidepid is enabled). > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484271#comment-16484271 ] Eric Yang commented on YARN-8259: - [~shaneku...@gmail.com] The proposal for implementing both is okay, but we can make better software with sensible optimization and pick a solution that can work for all scenarios without adding extra administration tasks. There is no objection with current approach. We are aware that hidepid corner case can generate additional system administration tasks to white list node manager to access all pid. We also know it cost more resource to fork exec with docker inspect approach. Human labor to configure OS with knowledge of Hadoop details is usually more expensive than adding processor or ram. It would be great if the solution can work without additional configuration flag, nor adding extra hardware resource. This means doing pid check as privileged user via container-executor may be preferred solution by system administrators without adding overhead to system administration chores. Can proc pid check work in docker in docker environment? > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483845#comment-16483845 ] Shane Kumpf commented on YARN-8259: --- {quote}System administrator can reserve one cpu core for node manager and all the docker inspect call are counted toward saturating one cpu core{quote} I'm less concerned about the cpu usage and more about docker's client/server model and the potential for hangs (that I've seen many of in the past under load). Personally, I want the /proc route for my systems and am not using hidepid. Losing a container due to an intermittent docker issue isn't really acceptable to me when an alternative exists that avoids the issue. What I could do is implement both the /proc and {{docker inspect}} approaches, and a configuration switch to choose the implementation for that that use hidepid (or a system without /proc). Would that be acceptable? I'm also going to make this a blocker, as all privileged containers are leaked on NM restart today. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483057#comment-16483057 ] Eric Yang commented on YARN-8259: - System administrator can reserve one cpu core for node manager and all the docker inspect call are counted toward saturating one cpu core, but not more. Exact accounting is not available today, but I usually recommend customers to do this to avoid system overload. At a glance of yarn code base, I only found one instance of code that is reading /proc/[pid]/ from node manager. This is located in CGroupsResourceCalculator.java. Hence, hidepid is not working by implementation. This can be addressed in other JIRAs to make this proper. I am +0 on this patch. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483041#comment-16483041 ] Eric Badger commented on YARN-8259: --- Also, I have tested the current patch for correctness. So, if we decide to go with the current implementation, I am +1 on the patch. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483038#comment-16483038 ] Eric Badger commented on YARN-8259: --- bq. If hidepid option is used by system administrator, yarn user might not have rights to check if /proc/[pid] exists. This might be a concern, but there is a workaround to allow for the admin to whitelist the NM user https://linux-audit.com/linux-system-hardening-adding-hidepid-to-proc/ bq. Also, the reacquistion code runs signalContainer once per second until the application finishes, this resulted in many docker inspect and container-executor calls, which are expensive operations. This worries me the most. Especially on nodes where there are lots of containers running concurrently, this could be pretty devastating for rolling upgrades. I'm not sure I have a strong opinion one way or another on retries vs. /proc for correctness, but I am worried about overloading the docker daemon with a large amount of inspect/ps calls. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483032#comment-16483032 ] Jason Lowe commented on YARN-8259: -- I do agree with Shane that there are already subsystems that currently rely on /proc to function properly, e.g.: container resource monitoring. Hiding pids will break those subsystems. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483029#comment-16483029 ] Jason Lowe commented on YARN-8259: -- Ah comment race with [~eyang], I'll defer until his concerns are addressed. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483028#comment-16483028 ] Jason Lowe commented on YARN-8259: -- Thanks for the patch! +1 lgtm. I'll commit this tomorrow if there are no objections. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482992#comment-16482992 ] Eric Yang commented on YARN-8259: - If I am not mistaken, DockerContainerRuntime is running as part of node manager. If hidepid option is used by system administrator, yarn user might not have rights to check if /proc/[pid] exists. We might need to create a LCE operation to perform the check, if we are going with the suggested pid file check path. I still prefers the docker inspect command path with retry logic. In a non-blocking IO system, it is hard to avoid coding logic for retries. The investment will pay off in the long run, when each retry value is defined and optimized to make the system reliable and robust. > Revisit liveliness checks for Docker containers > --- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.2, 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Shane Kumpf >Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8259) Revisit liveliness checks for Docker containers
[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482729#comment-16482729 ] genericqa commented on YARN-8259: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 27s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 32s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 18s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8259 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924356/YARN-8259.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d3ca0d4182cb 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f48fec8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20808/testReport/ | | Max. process+thread count | 301 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20808/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Revisit liveliness checks for Docker container