[
https://issues.apache.org/jira/browse/YARN-11919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051665#comment-18051665
]
ASF GitHub Bot commented on YARN-11919:
---------------------------------------
edwardcapriolo commented on PR #8177:
URL: https://github.com/apache/hadoop/pull/8177#issuecomment-3746776613
@cnauroth
Interestingly what I believe i have found is that your must deep copy these
objects. What I observe is repeated calls to the method changes the objects.
Something about the buffer sizing of the old code wasn't hitting the edge case.
For some confirmation I asked google: multipe callsto getpwnam_r to create
array of users.
It provided code that was doing a deep copy of the objects, and this
explanation.
Key Concepts
Reentrancy: getpwnam_r() is thread-safe because the caller provides the
memory (buffer) where the data is stored.
Persistent Storage: To build an array, you cannot just store pointers to
the temporary pwd struct used inside the loop. You must make a deep copy of the
data (the UserEntry struct and the strings within it) into a new, stable memory
location that persists across loop iterations.
Error Handling: Check the return value (status) and the result pointer
to differentiate between an error condition and a "user not found" scenario.
Memory Management: Dynamic memory allocation (malloc, strdup) requires
explicit deallocation using free to prevent memory leaks
This is exactly the conclusion I had come to that we need to deep copy this
object because the second call to the method alters the state of the first
struct.
> linux-container-executor segfault with get_user_info
> ----------------------------------------------------
>
> Key: YARN-11919
> URL: https://issues.apache.org/jira/browse/YARN-11919
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 3.4.2
> Reporter: Edward Capriolo
> Priority: Major
> Labels: pull-request-available
>
> The code in container executor for getting user information is slightly
> different then the recipe in the documentation. On alpine it repeatedly
> segfaults
> {code:java}
> rvn package -Pnative -Dmaven.skip.test=true -DskipTests -Dtar
> cp target/native/target/usr/local/bin/container-executor /usr/local/bin/
> chmod 6050 /usr/local/bin/container-executor
> chgrp hadoop /usr/local/bin/container-executor
> chmod 6050 /usr/local/bin/container-executor
> /usr/local/bin/container-executor nobody nobody 0
> application_1766935260716_0004 container_1766935260716_0004_02_000001
> /yarn-root/nm-local-dir/nmPriv
> {code}
> Result:
> {code:java}
> edgy
> main : command provided 0
> main : run as user is nobody
> main : requested yarn user is nobody
> main : validate_container_id
> main : huh
> validated command: INITIALIZE_CONTAINER
> init : set_user
> maybe free_user
> going to check user
> min id
> min id 1000
> Get user info
> passwd info
> got pwd
> Segmentation fault (core dumped)
> {code}
> The recipe here:
> [https://linux.die.net/man/3/getpwnam_r]
> has better success.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]