[ 
https://issues.apache.org/jira/browse/YARN-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967937#comment-16967937
 ] 

Shane Kumpf commented on YARN-9562:
-----------------------------------

Hey [~ebadger]. Thanks for your (and everyone elses) hard work here. Overall 
this looks to be coming together nicely.

I've taken a look at the code and have a couple of items, but nothing blocking. 
However, I'm having a bit of trouble getting runC containers working so far. 
I'm out of time to continue troubleshooting right now, but this is what I'm 
seeing, both dshell and MR pi do the same. Docker MR jobs are working. I am 
running all containers as the nobody user in this case.
{code:java}
2019-11-05 22:40:14,225 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
 Shell execution returned exit code: 35. Privileged Execution Operation Stderr:
Bad/Missing runC int
Could not create container dirs
Could not create local files and directories
Nonzero exit code=35, error message='Could not create work dirs'
Stdout: Can't create directory 
/tmp/hadoop-yarn/nm-local-dir/usercache/hadoopuser/appcache/application_1572993484434_0003/container_e04_1572993484434_0003_01_000002
 - Permission denied
Full command array for failed execution:
[/usr/local/hadoop/bin/container-executor, --run-runc-container, 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1572993484434_0003/container_e04_1572993484434_0003_01_000002/runc-config.json]{code}
 

Here are some questions/nits on the patch. None of these are blockers IMO.

Questions/Comments:

1) Why is the keystore and truststore needed within RuncContainerExecutorConfig?

2) I'm not a big fan of hard coded mounts like this. This would also be 
problematic for systemd based containers where systemd expects /tmp to be a 
tmpfs.
{code:java}
    addRuncMountLocation(mounts, containerWorkDir.toString() +
        "/private_slash_tmp", "/tmp", true, true);
    addRuncMountLocation(mounts, containerWorkDir.toString() +
        "/private_var_slash_tmp", "/var/tmp", true, true);
{code}
3) It would be great to track these disabled features for future implementation.
{code:java}
  public String getExposedPorts(Container container) {
    return null;
  }

  public String[] getIpAndHost(Container container) {
    return null;
  }

  public IOStreamPair execContainer(ContainerExecContext ctx)
      throws ContainerExecutionException {
    return null;
  }

  public void reapContainer(ContainerRuntimeContext ctx)
      throws ContainerExecutionException {
  }

  public void relaunchContainer(ContainerRuntimeContext ctx)
      throws ContainerExecutionException {
  }
{code}
Nits:

1) clean up the whitespace around Container#getContainerRuntimeData

2) RuncContainerExecutorConfig typo in class javadoc

3) YarnConfiguration DEFAULT_NM_RUNC_ALLOWED_CONTAINER_NETWORKS and 
DEFAULT_NM_RUNC_ALLOWED_CONTAINER_RUNTIMES - copy and paste error on the javadoc

4) Many of the tests create tmpDirs but don't appear to clean them up. 
TestRuncContainerRuntime creates two temp dirs, once via mkdirs and the other 
via a Rule.
{code:java}
TestDockerContainerRuntime mkdirs for tmpDir
TestHdfsManifestToResouvesPlugin creates a tmpDir but doesn't clean it up
TestRuncContainerRuntime has both a tmpDir and TempDir created by a @Rule
{code}
5) Docs
 * Overview: "if created", newline after runC in second paragraph.
 * Docker to squash section: first paragraph "Getting" newline.
 * I'm fine with leaving reference to the patch to docker_to_squash.py for now 
until we have a better story, but I did need to do a few steps to get that tool 
working. 1) Create the hdfs runc-root as root 2) install skopeo, 
squashfs-tools, and attr.

> Add Java changes for the new RuncContainerRuntime
> -------------------------------------------------
>
>                 Key: YARN-9562
>                 URL: https://issues.apache.org/jira/browse/YARN-9562
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>            Priority: Major
>         Attachments: YARN-9562.001.patch, YARN-9562.002.patch, 
> YARN-9562.003.patch, YARN-9562.004.patch, YARN-9562.005.patch, 
> YARN-9562.006.patch, YARN-9562.007.patch, YARN-9562.008.patch, 
> YARN-9562.009.patch, YARN-9562.010.patch, YARN-9562.011.patch, 
> YARN-9562.012.patch, YARN-9562.013.patch
>
>
> This JIRA will be used to add the Java changes for the new 
> RuncContainerRuntime. This will work off of YARN-9560 to use much of the 
> existing DockerLinuxContainerRuntime code once it is moved up into an 
> abstract class that can be extended. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to