[
https://issues.apache.org/jira/browse/YARN-11919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051900#comment-18051900
]
ASF GitHub Bot commented on YARN-11919:
---------------------------------------
edwardcapriolo commented on PR #8177:
URL: https://github.com/apache/hadoop/pull/8177#issuecomment-3750743216
here is my final summary of the issue. IMHO The code as it is in master
amazingly works only in limited contexts. Here is why:
```c
hile ((s = getpwnam_r(user, &pwd, buf, bufsize, &result)) == ERANGE){
```
This is the proper way to use this method. It may return ERANGE which means
the buffer is not big enough and you need to keep trying.
Next the big problem: the passwd stuct has pointers to buffers that can be
recyled by other calls to getpwnam_r. So the global object could be corrupted
by further calls.
```c
//struct to store the user details
-struct passwd *user_detail = NULL;
+struct serialized_passwd *user_detail = NULL;
```
This was effectively the root error I originally observed when I tried to
take this to alpine. The implementation of passwd is sufficiently different
that it exposed the problem above, which amazingly works on my systems.
Please review @cnauroth and other people who are skilled with c/c++. Thanks.
> linux-container-executor segfault with get_user_info
> ----------------------------------------------------
>
> Key: YARN-11919
> URL: https://issues.apache.org/jira/browse/YARN-11919
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.4.2
> Reporter: Edward Capriolo
> Priority: Major
> Labels: pull-request-available
>
> The code in container executor for getting user information is slightly
> different then the recipe in the documentation. On alpine it repeatedly
> segfaults
> {code:java}
> rvn package -Pnative -Dmaven.skip.test=true -DskipTests -Dtar
> cp target/native/target/usr/local/bin/container-executor /usr/local/bin/
> chmod 6050 /usr/local/bin/container-executor
> chgrp hadoop /usr/local/bin/container-executor
> chmod 6050 /usr/local/bin/container-executor
> /usr/local/bin/container-executor nobody nobody 0
> application_1766935260716_0004 container_1766935260716_0004_02_000001
> /yarn-root/nm-local-dir/nmPriv
> {code}
> Result:
> {code:java}
> edgy
> main : command provided 0
> main : run as user is nobody
> main : requested yarn user is nobody
> main : validate_container_id
> main : huh
> validated command: INITIALIZE_CONTAINER
> init : set_user
> maybe free_user
> going to check user
> min id
> min id 1000
> Get user info
> passwd info
> got pwd
> Segmentation fault (core dumped)
> {code}
> The recipe here:
> [https://linux.die.net/man/3/getpwnam_r]
> has better success.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]