[ https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890503#comment-16890503 ]
Jim Brennan commented on YARN-9647: ----------------------------------- [~ebadger], [~eyang], [~magnum] I think I'm following the discussion and I agree with the problem analysis. {quote}It's slightly more nuanced than this. If the lists don't match the container still could've failed because of an invalid mount. Basically if we get an invalid mount error then we need to figure out whether that invalid mount was in the original allowed-mounts lists in container-executor.cfg. If it was, then the error message should indicate a bad disk. Otherwise, the usual invalid mount error message should be fine. {quote} Do we need to maintain two lists? check_mount_permitted() is already returning -1 in the case where the normalize_mount fails for the mount_src before even checking if it is permitted. If the disk is bad, I think this is where it will fail. I don't think we'll get to the point of checking whether it is permitted? Maybe we just need to change this error message: {noformat} fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", values[i], mount_src); {noformat} to {noformat} fprintf(ERRORFILE, "Invalid source path '%s' for docker mount '%s', maybe bad disk?\n", mount_src, values[i]); {noformat} Even better, pull the normalizing of mount_src out of check_mount_permitted and do it separately. {noformat} char *normalized_path = normalize_mount(mount_src, 0); if (normalized_path == NULL) { fprintf(ERRORFILE, "Invalid source path '%s' for docker mount '%s', maybe bad disk?\n", mount_src, values[i]); ret = INVALID_DOCKER_MOUNT; goto free_and_exit; } permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, normalized_path); permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, normalized_path); {noformat} For paths coming from NM (local dirs / log dirs) it should have already checked to ensure bad ones aren't in the list. > Docker launch fails when local-dirs or log-dirs is unhealthy. > ------------------------------------------------------------- > > Key: YARN-9647 > URL: https://issues.apache.org/jira/browse/YARN-9647 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.1.2 > Reporter: KWON BYUNGCHANG > Priority: Major > Attachments: YARN-9647.001.patch, YARN-9647.002.patch > > > my /etc/hadoop/conf/container-executor.cfg > {code} > [docker] > docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local > docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local > {code} > if /data2 is unhealthy, docker launch fails although container can use > /data1 as local-dir, log-dir > error message is below > {code} > [2019-06-25 14:55:26.168]Exception from container-launch. Container id: > container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: > Launch container failed Shell error output: Could not determine real path of > mount '/data2/hadoop/yarn/local' Could not determine real path of mount > '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk > Error constructing docker command, docker error code=16, error message='Mount > access error' Shell output: main : command provided 4 main : run as user is > magnum main : requested yarn user is magnum Creating script paths... Creating > local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit > code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code > 29. > {code} > root cause is that normalize_mounts() in docker-util.c return -1 because it > cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is > disk fault at this point) > however disk of nm local dirs and nm log dirs can fail at any time. > docker launch should succeed if there are available local dirs and log dirs. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org