[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890443#comment-16890443
 ] 

Eric Badger commented on YARN-9647:
-----------------------------------

bq. We can resolve this error by keeping track of the original 
container-executor.cfg, and normalized list. When two lists are not matching, 
container-executor can provide a different error message that container failed 
to launch due to unhealthy disk rather than continuing.

It's slightly more nuanced than this. If the lists don't match the container 
still could've failed because of an invalid mount. Basically if we get an 
invalid mount error then we need to figure out whether that invalid mount was 
in the original allowed-mounts lists in container-executor.cfg. If it was, then 
the error message should indicate a bad disk. Otherwise, the usual invalid 
mount error message should be fine. 

But as long as the logic isn't too complicated, I'm ok with this

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -------------------------------------------------------------
>
>                 Key: YARN-9647
>                 URL: https://issues.apache.org/jira/browse/YARN-9647
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.1.2
>            Reporter: KWON BYUNGCHANG
>            Priority: Major
>         Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>    docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>    docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to