[ 
https://issues.apache.org/jira/browse/YARN-11919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051665#comment-18051665
 ] 

ASF GitHub Bot commented on YARN-11919:
---------------------------------------

edwardcapriolo commented on PR #8177:
URL: https://github.com/apache/hadoop/pull/8177#issuecomment-3746776613

   @cnauroth 
   
   Interestingly what I believe i have found is that your must deep copy these 
objects.  What I observe is repeated calls to the method changes the objects. 
Something about the buffer sizing of the old code wasn't hitting the edge case. 
   
   For some confirmation I asked google: multipe callsto getpwnam_r to create 
array of users.
   It provided code that was doing a deep copy of the objects, and this 
explanation.
   
   Key Concepts
   
       Reentrancy: getpwnam_r() is thread-safe because the caller provides the 
memory (buffer) where the data is stored.
       Persistent Storage: To build an array, you cannot just store pointers to 
the temporary pwd struct used inside the loop. You must make a deep copy of the 
data (the UserEntry struct and the strings within it) into a new, stable memory 
location that persists across loop iterations.
       Error Handling: Check the return value (status) and the result pointer 
to differentiate between an error condition and a "user not found" scenario.
       Memory Management: Dynamic memory allocation (malloc, strdup) requires 
explicit deallocation using free to prevent memory leaks
   
   
   This is exactly the conclusion I had come to that we need to deep copy this 
object because the second call to the method alters the state of the first 
struct.




> linux-container-executor segfault with get_user_info
> ----------------------------------------------------
>
>                 Key: YARN-11919
>                 URL: https://issues.apache.org/jira/browse/YARN-11919
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.4.2
>            Reporter: Edward Capriolo
>            Priority: Major
>              Labels: pull-request-available
>
> The code in container executor for getting user information is slightly 
> different then the recipe in the documentation. On alpine it repeatedly 
> segfaults
> {code:java}
> rvn package -Pnative -Dmaven.skip.test=true -DskipTests -Dtar
> cp target/native/target/usr/local/bin/container-executor /usr/local/bin/
> chmod 6050 /usr/local/bin/container-executor
> chgrp hadoop /usr/local/bin/container-executor
> chmod 6050 /usr/local/bin/container-executor
> /usr/local/bin/container-executor nobody nobody 0 
> application_1766935260716_0004  container_1766935260716_0004_02_000001 
> /yarn-root/nm-local-dir/nmPriv
>  {code}
> Result:
> {code:java}
> edgy
> main : command provided 0
> main : run as user is nobody
> main : requested yarn user is nobody
> main : validate_container_id 
> main : huh 
> validated command: INITIALIZE_CONTAINER
> init : set_user 
> maybe free_user 
> going to check user 
> min id 
> min id 1000
> Get user info 
> passwd info 
> got pwd 
> Segmentation fault (core dumped)
>  {code}
> The recipe here: 
> [https://linux.die.net/man/3/getpwnam_r]
> has better success.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to