[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347158#comment-16347158 ] Eric Badger commented on YARN-7677: --- +1 (non-binding). Looks good to me. Thanks for fixing this [~Jim_Brennan] > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Eric Badger >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7677.001.patch, YARN-7677.002.patch > > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345645#comment-16345645 ] Jason Lowe commented on YARN-7677: -- Thanks for the patch! +1 lgtm. I will commit this tomorrow if there are no objections. > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Eric Badger >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7677.001.patch, YARN-7677.002.patch > > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345537#comment-16345537 ] genericqa commented on YARN-7677: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 32s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 36s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 28s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7677 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908365/YARN-7677.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 20d6d9ece274 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 901d15a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/19527/testReport/ | | Max. process+thread count | 441 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/19527/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > HADOOP_CONF_DIR should not be automatically put
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345301#comment-16345301 ] Jim Brennan commented on YARN-7677: --- I believe the unit test failure (testContainerUpgradeRollbackDueToFailure) is unrelated to this change. I will fix the style issues and submit a new patch. > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Eric Badger >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7677.001.patch > > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345289#comment-16345289 ] genericqa commented on YARN-7677: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 16s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 4 new + 157 unchanged - 0 fixed = 161 total (was 157) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 21s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 53s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7677 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12908215/YARN-7677.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 31bf9018ad69 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 901d15a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/19526/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/19526/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344137#comment-16344137 ] Jim Brennan commented on YARN-7677: --- Based on the discussion here and in YARN-7226, and after discussing with [~jlowe] and [~ebadger], I have put up a patch for this. The change is as follows: # Remove the line in ContainerLaunch.sh that explicitly adds HADOOP_CONF_DIR to the environment (as noted above). # Instead of ignoring the whitelist in the case of docker, always add the whitelist environment variables that are not already defined in the containers context using theĀ {{var:-default}} variable expansion syntax. In the docker case where a whitelist environment variable is defined in the image, this will prevent the launch script from overwriting it with the one from the Nodemanager's environment. The non-docker case behaves the same as before, except that the whitelisted environment variables that are not defined by the container context are set using theĀ {{var:-default}} syntax, but in this case the default value is always used. > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-7677.001.patch > > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316623#comment-16316623 ] Eric Badger commented on YARN-7677: --- bq. This change will make yarnfile content more consistent that environment variable and mounting directories both needs to present in yarnfile to show HADOOP_CONF_DIR is exposed to docker container. Yes, that is correct. I'll go ahead an put up a patch in a little bit once I get a free moment. > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310385#comment-16310385 ] Eric Yang commented on YARN-7677: - [~ebadger] I think I understand the goal now. Thank you. In current implementation, partial declaration of bind-mount may allow HADOOP_CONF_DIR to be sourced implicitly without proper user consent. This change will make yarnfile content more consistent that environment variable and mounting directories both needs to present in yarnfile to show {{HADOOP_CONF_DIR}} is exposed to docker container. > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310327#comment-16310327 ] Eric Badger commented on YARN-7677: --- bq. Some Hadoop features will not work, i.e. short circuits read, if host and docker containers are not matching. This is true. I would like to work towards a solution where we use something similar {{dfs.domain.socket.path}}, since it already defines the short-circuit socket. However, I'm not sure how to do that without copying the config, since this is a dfs property that will be used by the datanode (i.e. not the container-executor). bq. If we handle security properly with white list mount (YARN-5534), container-executor validation (YARN-7590), and check sudo privileges before launching privileged container (YARN-7221). Any particular reason that we shouldn't allow read-only bind-mount HADOOP_CONF_DIR? Nope, I don't think there is any problem with bind-mounting {{HADOOP_CONF_DIR}}. However, I don't think it should be a requirement. For example, you should be able to use an older version of hadoop as the client (task), while the server (NM) uses a newer version. If we pass in {{HADOOP_CONF_DIR}} then this is not possible. If we are constantly bind-mounting in hadoop to all of the containers, then we lose some of the wonder of docker, which is that the container stays constant and consistent over time. Some may choose to bind-mount hadoop, but it should be a choice, not a requirement bq. White list is used by container-executor, which resides in host, and not docker container. How is the by pass happens? This happens because of a call in {{ContainerLaunch.java}} that automatically adds {{HADOOP_CONF_DIR}} to the environment. This environment is parsed in {{launch_container.sh}}, which is the script that the docker container is started with. {noformat:title=ContainerLaunch.sh} 1388putEnvIfAbsent(environment, Environment.HADOOP_CONF_DIR.name()); {noformat} > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310288#comment-16310288 ] Eric Yang commented on YARN-7677: - I think I am stuck on understanding: {quote} It completely bypasses the whitelist and so there is no way for a task to not have HADOOP_CONF_DIR set. {quote} White list is used by container-executor, which resides in host, and not docker container. How is the by pass happens? > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310273#comment-16310273 ] Eric Yang commented on YARN-7677: - [~ebadger] My initial reaction would be to make docker container to follow the host layout for single cluster setup. Some Hadoop features will not work, i.e. short circuits read, if host and docker containers are not matching. For docker container to regenerate fine tuned Hadoop configuration to match host, it could take some effort from docker container developer. There is a high probability that developers end up using bind-mount to transport Hadoop configuration. If we handle security properly with white list mount (YARN-5534), container-executor validation (YARN-7590), and check sudo privileges before launching privileged container (YARN-7221). Any particular reason that we shouldn't allow read-only bind-mount {{HADOOP_CONF_DIR}}? > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310247#comment-16310247 ] Eric Badger commented on YARN-7677: --- [~eyang], it doesn't necessarily need to be separate hadoop clusters. It could just be a node where the NM runs on the bare metal host and the tasks run in docker containers. In that case, they would need to know where {{HADOOP_CONF_DIR}} is. Since the docker image is completely separate from the host layout, we can't assume that hadoop is going to be put in the same place. {{HADOOP_CONF_DIR}} isn't getting bind-mounted into the container, so the only way this would even work is by a happy coincidence and/or planning the layout of the image to match that of the host. But that coupling is certainly not necessary and the docker image is the one that actually knows where {{HADOOP_CONF_DIR}} is located. The nodemanager knows where its {{HADOOP_CONF_DIR}} is located, but that is on the host, not in the docker container. And again, since {{HADOOP_CONF_DIR}} is in the default env whitelist, the behavior here will only change if you explicitly change the env whitelist and remove it. So I believe the impact here to be fairly low. Regardless, I don't think it's correct for the NM to be defining the layout of the docker image (i.e. where {{HADOOP_CONF_DIR}} has to be located). > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309995#comment-16309995 ] Eric Yang commented on YARN-7677: - [~ebadger] I think I understand your scenario better now. When host and docker environments runs two separate Hadoop clusters, we do not want the host Hadoop configuration to be exposed to docker because disk settings and file system layout do not apply. Other scenarios, such as HDFS is outside of docker container, and running Spark python application in docker container to access host level HDFS, Hadoop configuration should be inherited from host to make sure well optimized timing settings are exposed. For huge clusters, the first scenario maybe used to isolate virtual clusters. For smaller clusters, it is most likely to run mix workload and use docker to isolate programming libraries. Host level node manager white list can not get overwritten by container. I think both cases can be supported, and the default is probably inheriting {{HADOOP_CONF_DIR}} for smaller clusters to boost efficient utilization of system resource. It would be better if we build a switch for env_reset as part of job submission flag to disable Hadoop system environment variable inheritance. > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309906#comment-16309906 ] Eric Badger commented on YARN-7677: --- [~eyang], the docker container file system layout doesn't have to be the same as the host. It's entirely possible for an image to have {{HADOOP_CONF_DIR}} set to something like {{/home/conf}}, while the host has it in {{/tmp/conf}}. In the current implementation, there is no way for {{HADOOP_CONF_DIR}} to be set correctly in this case without the user explicitly setting it with their job. Even if you set up {{HADOOP_CONF_DIR}} in the image as an environment variable, it will be overridden by the container startup script, since {{HADOOP_CONF_DIR}} is being passed in by default regardless. In the case where a user doesn't set {{HADOOP_CONF_DIR}} explicitly, it makes sense that the docker image will know where to set it (via ENV), while the NM will not, since their layouts are not necessarily the same. > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308357#comment-16308357 ] Eric Yang commented on YARN-7677: - [~ebadger] Happy new year. I think it will be safer for {HADOOP_CONF_DIR} to be passed from host to docker image as default. This is better for preventing mistakes instead of allowing override system specific settings at container level. This will also ensure that when an application requires system settings, docker doesn't need to reconstruct the environment, but simply mount the {HADOOP_CONF_DIR} as source of truth. If docker container wants to generate its own environment, there shouldn't be anything getting in the way for docker application to accomplish that. I don't understand how is this paramount for docker case, could you elaborate? Thanks > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298953#comment-16298953 ] Eric Badger commented on YARN-7677: --- My proposal would be to remove {{HADOOP_CONF_DIR}}, as well as potentially {{USER}}, {{LOGNAME}}, {{HOME}}, and {{PWD}} from ContainerLaunch.java and require them to be in the environment whitelist if they are to be taken from Nodemanager environment. Arguably, these all should be removed, but the strongest case can be made for {{HADOOP_CONF_DIR}}, since it is already in the default environment whitelist. So the only way this would break a use case is if someone was using their own whitelist and didn't include {{HADOOP_CONF_DIR}}. While this change would be incompatible, I think it makes sense for the non-docker case, and is paramount for the docker case. > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment
[ https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298946#comment-16298946 ] Eric Badger commented on YARN-7677: --- Linking YARN-3611 since this is related to Docker development. Not putting it as a subtask, however, because this JIRA has impacts outside of Docker. > HADOOP_CONF_DIR should not be automatically put in task environment > --- > > Key: YARN-7677 > URL: https://issues.apache.org/jira/browse/YARN-7677 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > > Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether > it's set by the user or not. It completely bypasses the whitelist and so > there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes > problems in the Docker use case where Docker containers will set up their own > environment and have their own {{HADOOP_CONF_DIR}} preset in the image > itself. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org