[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-23 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891193#comment-16891193
 ] 

Eric Yang commented on YARN-9647:
-

[~Jim_Brennan] I agree this approach is better optimization for performance and 
code readability.  [~magnum] Can you try Jim's approach?  Thanks

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-22 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890503#comment-16890503
 ] 

Jim Brennan commented on YARN-9647:
---

[~ebadger], [~eyang], [~magnum] I think I'm following the discussion and I 
agree with the problem analysis.
{quote}It's slightly more nuanced than this. If the lists don't match the 
container still could've failed because of an invalid mount. Basically if we 
get an invalid mount error then we need to figure out whether that invalid 
mount was in the original allowed-mounts lists in container-executor.cfg. If it 
was, then the error message should indicate a bad disk. Otherwise, the usual 
invalid mount error message should be fine.
{quote}
Do we need to maintain two lists? check_mount_permitted() is already returning 
-1 in the case where the normalize_mount fails for the mount_src before even 
checking if it is permitted. If the disk is bad, I think this is where it will 
fail. I don't think we'll get to the point of checking whether it is permitted? 
Maybe we just need to change this error message:
{noformat}
fprintf(ERRORFILE, "Invalid docker mount '%s', realpath=%s\n", values[i], 
mount_src);
{noformat}
to
{noformat}
fprintf(ERRORFILE, "Invalid source path '%s' for docker mount '%s', maybe bad 
disk?\n", mount_src, values[i]);
{noformat}
Even better, pull the normalizing of mount_src out of check_mount_permitted and 
do it separately.
{noformat}
  char *normalized_path = normalize_mount(mount_src, 0);
  if (normalized_path == NULL) {
  fprintf(ERRORFILE, "Invalid source path '%s' for docker mount '%s', maybe 
bad disk?\n", mount_src, values[i]);
  ret = INVALID_DOCKER_MOUNT;
  goto free_and_exit;
  }
  permitted_rw = check_mount_permitted((const char **) permitted_rw_mounts, 
normalized_path);
  permitted_ro = check_mount_permitted((const char **) permitted_ro_mounts, 
normalized_path);

{noformat}
For paths coming from NM (local dirs / log dirs) it should have already checked 
to ensure bad ones aren't in the list.

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-22 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890443#comment-16890443
 ] 

Eric Badger commented on YARN-9647:
---

bq. We can resolve this error by keeping track of the original 
container-executor.cfg, and normalized list. When two lists are not matching, 
container-executor can provide a different error message that container failed 
to launch due to unhealthy disk rather than continuing.

It's slightly more nuanced than this. If the lists don't match the container 
still could've failed because of an invalid mount. Basically if we get an 
invalid mount error then we need to figure out whether that invalid mount was 
in the original allowed-mounts lists in container-executor.cfg. If it was, then 
the error message should indicate a bad disk. Otherwise, the usual invalid 
mount error message should be fine. 

But as long as the logic isn't too complicated, I'm ok with this

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-22 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890435#comment-16890435
 ] 

Eric Yang commented on YARN-9647:
-

[~ebadger], I think the approach taken is ok.  We want to filter out bad disk 
from allowed mount to guard against user defined mount point or system 
suggested mount point.  The difficult part is to identify if the mount path is 
user specified or system suggested.  In .cmd file, both user specified and 
system suggested paths are listed together.  There is no easy way to rotate to 
a different disk, unless node manager relaunch the container with another set 
of workdir paths.

[~magnum] , I think [~ebadger] is also right that this patch may have 
misleading error message when bad disk happens.  We can resolve this error by 
keeping track of the original container-executor.cfg, and normalized list.  
When two lists are not matching, container-executor can provide a different 
error message that container failed to launch due to unhealthy disk rather than 
continuing.

Would this work?

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-22 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890394#comment-16890394
 ] 

Eric Badger commented on YARN-9647:
---

[~eyang], [~Jim_Brennan], [~billie.rinaldi], any ideas on how to fix this in a 
clean way? 

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-22 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890393#comment-16890393
 ] 

Eric Badger commented on YARN-9647:
---

[~magnum], thanks for the explanation. I understand what you mean since 
{{docker.allowed.[ro,rw]-mounts}} will always be parsed and if either are bad 
then you will fail all launches. However, some errors might get confusing with 
your proposed approach. For example, the user may set bind-mounts or there may 
be some defined mounts for all containers. Those could be hard-coded in confs 
(or by users' jobs) and then once the container is launched the container will 
get an invalid docker mount message even though the mount is in the allowed 
list. 

It would be nice to be able to not fail on bad disks in the allowed lists, but 
also have good logging when the container fails due to a bad disk. Simply 
ignoring the bad disks in the allowed list gives you a misleading error message 
if the container attempts to use those disks. 

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-16 Thread KWON BYUNGCHANG (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885995#comment-16885995
 ] 

KWON BYUNGCHANG commented on YARN-9647:
---

[~ebadger] Thanks for your comments.

The process of mounting volume in YARN is as follows.

step1.  validate mountable point (docker.allowed.ro-mounts, 
docker.allowed.rw-mounts) that is configured by yarn administrator  in 
/etc/hadoop/conf/container-executor.cfg
step2. validate mount point that is configured by user 
step3. validate mount point of step2 belong to mountable point of step1

if  /data2/  is unhealthy,  threre is not /data2/ in mount point configuration 
(step2) because nodemanager already know /data2 is unhealthy.
problem is  /data2 still exists in /etc/hadoop/conf/container-executor.cfg 
because container-exector.cfg is static configuation file.
and docker launch fails in step1 because container-executor cannot resolve real 
path of /data2. 
I simply modified step1 to ignore unresolving mountable path.

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-12 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883948#comment-16883948
 ] 

Eric Badger commented on YARN-9647:
---

Hi [~magnum], thanks for the patch.

Could you explain the failure you're seeing a little bit more? The only way I 
can see this specific scenario happening is if the disk goes bad between the 
time when the Java code grabs the good local/log dirs and when the 
container-executor actually goes to validate the mounts. That seems like a 
pretty small race, and if the disk has gone bad in that time, the container 
should fail. There could be distributed cache files on the disk that went bad. 
So if we just skip mounting that disk, then we could fail anyway from not 
having those distributed cache files. Additionally, the java code that will be 
launched inside of the container already has a list of local and log dirs that 
it can use. If we blindly ignore mounting those directories then the java code 
could try to write to them and fail because the mounts don't exist. Both of 
these failures would probably look weird and confusing and I think failing the 
container upright is a much better decision than hoping that the container 
somehow succeeds. 

> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9647.001.patch, YARN-9647.002.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>docker.allowed.rw-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
> {code}
> if /data2 is unhealthy, docker launch fails  although container can use 
> /data1 as local-dir, log-dir 
> error message is below
> {code}
> [2019-06-25 14:55:26.168]Exception from container-launch. Container id: 
> container_e50_1561100493387_5185_01_000597 Exit code: 29 Exception message: 
> Launch container failed Shell error output: Could not determine real path of 
> mount '/data2/hadoop/yarn/local' Could not determine real path of mount 
> '/data2/hadoop/yarn/local' Unable to find permitted docker mounts on disk 
> Error constructing docker command, docker error code=16, error message='Mount 
> access error' Shell output: main : command provided 4 main : run as user is 
> magnum main : requested yarn user is magnum Creating script paths... Creating 
> local dirs... [2019-06-25 14:55:26.189]Container exited with a non-zero exit 
> code 29. [2019-06-25 14:55:26.192]Container exited with a non-zero exit code 
> 29. 
> {code}
> root cause is that normalize_mounts() in docker-util.c return -1  because it 
> cannot resolve real path of /data2/hadoop/yarn/local.(note that /data2 is 
> disk fault  at this point)
> however disk of nm local dirs and nm log dirs can fail at any time.
> docker launch should succeed if there are available local dirs and log dirs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-07-02 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876929#comment-16876929
 ] 

Hadoop QA commented on YARN-9647:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
30m 39s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m  8s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9647 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973415/YARN-9647.002.patch |
| Optional Tests |  dupname  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 582f77bc5327 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e966edd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24340/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24340/testReport/ |
| Max. process+thread count | 417 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24340/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON 

[jira] [Commented] (YARN-9647) Docker launch fails when local-dirs or log-dirs is unhealthy.

2019-06-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872117#comment-16872117
 ] 

Hadoop QA commented on YARN-9647:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
30m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 45s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
22s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 17s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9647 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12972821/YARN-9647.001.patch |
| Optional Tests |  dupname  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 3e0151d7ef39 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 041e7a7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24319/testReport/ |
| Max. process+thread count | 469 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24319/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Docker launch fails when local-dirs or log-dirs is unhealthy.
> -
>
> Key: YARN-9647
> URL: https://issues.apache.org/jira/browse/YARN-9647
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9647.001.patch
>
>
> my /etc/hadoop/conf/container-executor.cfg
> {code}
> [docker]
>docker.allowed.ro-mounts=/data1/hadoop/yarn/local,/data2/hadoop/yarn/local
>