[ https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967937#comment-16967937 ]
Shane Kumpf commented on YARN-9562: ----------------------------------- Hey [~ebadger]. Thanks for your (and everyone elses) hard work here. Overall this looks to be coming together nicely. I've taken a look at the code and have a couple of items, but nothing blocking. However, I'm having a bit of trouble getting runC containers working so far. I'm out of time to continue troubleshooting right now, but this is what I'm seeing, both dshell and MR pi do the same. Docker MR jobs are working. I am running all containers as the nobody user in this case. {code:java} 2019-11-05 22:40:14,225 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 35. Privileged Execution Operation Stderr: Bad/Missing runC int Could not create container dirs Could not create local files and directories Nonzero exit code=35, error message='Could not create work dirs' Stdout: Can't create directory /tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1572993484434_0003/container_e04_1572993484434_0003_01_000002 - Permission denied Full command array for failed execution: [/usr/local/hadoop/bin/container-executor, --run-runc-container, /tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1572993484434_0003/container_e04_1572993484434_0003_01_000002/runc-config.json]{code} Here are some questions/nits on the patch. None of these are blockers IMO. Questions/Comments: 1) Why is the keystore and truststore needed within RuncContainerExecutorConfig? 2) I'm not a big fan of hard coded mounts like this. This would also be problematic for systemd based containers where systemd expects /tmp to be a tmpfs. {code:java} addRuncMountLocation(mounts, containerWorkDir.toString() + "/private_slash_tmp", "/tmp", true, true); addRuncMountLocation(mounts, containerWorkDir.toString() + "/private_var_slash_tmp", "/var/tmp", true, true); {code} 3) It would be great to track these disabled features for future implementation. {code:java} public String getExposedPorts(Container container) { return null; } public String[] getIpAndHost(Container container) { return null; } public IOStreamPair execContainer(ContainerExecContext ctx) throws ContainerExecutionException { return null; } public void reapContainer(ContainerRuntimeContext ctx) throws ContainerExecutionException { } public void relaunchContainer(ContainerRuntimeContext ctx) throws ContainerExecutionException { } {code} Nits: 1) clean up the whitespace around Container#getContainerRuntimeData 2) RuncContainerExecutorConfig typo in class javadoc 3) YarnConfiguration DEFAULT_NM_RUNC_ALLOWED_CONTAINER_NETWORKS and DEFAULT_NM_RUNC_ALLOWED_CONTAINER_RUNTIMES - copy and paste error on the javadoc 4) Many of the tests create tmpDirs but don't appear to clean them up. TestRuncContainerRuntime creates two temp dirs, once via mkdirs and the other via a Rule. {code:java} TestDockerContainerRuntime mkdirs for tmpDir TestHdfsManifestToResouvesPlugin creates a tmpDir but doesn't clean it up TestRuncContainerRuntime has both a tmpDir and TempDir created by a @Rule {code} 5) Docs * Overview: "if created", newline after runC in second paragraph. * Docker to squash section: first paragraph "Getting" newline. * I'm fine with leaving reference to the patch to docker_to_squash.py for now until we have a better story, but I did need to do a few steps to get that tool working. 1) Create the hdfs runc-root as root 2) install skopeo, squashfs-tools, and attr. > Add Java changes for the new RuncContainerRuntime > ------------------------------------------------- > > Key: YARN-9562 > URL: https://issues.apache.org/jira/browse/YARN-9562 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Eric Badger > Assignee: Eric Badger > Priority: Major > Attachments: YARN-9562.001.patch, YARN-9562.002.patch, > YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, > YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, > YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, > YARN-9562.012.patch, YARN-9562.013.patch > > > This JIRA will be used to add the Java changes for the new > RuncContainerRuntime. This will work off of YARN-9560 to use much of the > existing DockerLinuxContainerRuntime code once it is moved up into an > abstract class that can be extended. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org