Sounds good to me.

> Am 03.06.2024 um 16:26 schrieb Stephen Clark via user <[email protected]>:
> 
> Thanks Richard.
>  
> I’ve come up with what I think is a better implementation for the PR based on 
> our use case:
> 
>         if (user != null && !user.isEmpty()) {
>             cmdArgs.add(user);
>             int exitCode = 0;
>             try {
>                 exitCode = new ProcessBuilder(cmdArgs).start().waitFor();
>             } catch (Exception e) {
>                 // Ignore
>             } finally {
>                 if (exitCode != 0) {
>                     LOG.debug("CMD: '{}' returned exit code of {}", 
> String.join(" ", cmdArgs), exitCode);
>                     cmdArgs.remove(user);
>                 }
>             }
>         }
>  
> The code will first attempt to run the id -u command with a user specified.  
> If this is unsuccessful, then it removes the user from the command-line.
> 
> In our scenario we do ultimately need to run the id -u command without a user 
> specified to output the id of the user running the java process.
>  
> Should I go ahead and create a PR for this code change?  I’m happy to act as 
> a test reproducer too.
>  
> Thanks,
>  
> Steve
>  
> From: Richard Zowalla <[email protected] <mailto:[email protected]>>
> Sent: Wednesday, May 15, 2024 3:51 PM
> To: [email protected] <mailto:[email protected]>; Stephen Clark 
> <[email protected] <mailto:[email protected]>>
> Subject: [EXTERNAL] Re: Issue with Worker Termination in Kubernetes
>  
> CAUTION: This email originated from outside the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
> 
> Hi, 
>  
> I think we should fix that and just ensure, that this command cannot fail for 
> a non existing user.
> I wouldn’t check for „?“, though. Maybe we can just define a new property to 
> explicitly skip those uid checks (similar as it is done for Windows).
>  
> The patch might work because „id -u“ will just output the id of the user 
> running the java process.
>  
> Overall, I guess, that we are open for a PR with a test reproducer :)
>  
> Gruß
> Richard
> 
> 
> Am 14.05.2024 um 16:40 schrieb Stephen Clark via user <[email protected] 
> <mailto:[email protected]>>:
>  
> Hi,
>  
> I am running the Storm Supervisor in an image that I've created in Kubernetes 
> using a securityContext that has the following:
>  
>       securityContext:
>         runAsUser: 1000620005
>         fsGroup: 1000620005
>         supplementalGroups: [ 64000 ]
>  
> The UID 1000620005 is not related to a user specified in the /etc/passwd file 
> in the Docker image.
>  
> When I kill a topology, this generates the following exception:
>  
> 2024-05-10 06:26:10.661 [SLOT_6700] itemId= { jobName="" ,jobTemplateId="" 
> ,userOrAppId="" ,tenantId="",  jobStep="", scaleCopyJobId=""} ERROR 
> apache.storm.daemon.supervisor.Slot - Error when processing event 
> java.lang.NullPointerException: null
>       at org.apache.storm.utils.ServerUtils.getUserId(ServerUtils.java:1095) 
> ~[storm-server-2.6.1.jar:2.6.1]
>       at 
> org.apache.storm.utils.ServerUtils.isAnyPosixProcessPidDirAlive(ServerUtils.java:1284)
>  ~[storm-server-2.6.1.jar:2.6.1]
>       at 
> org.apache.storm.utils.ServerUtils.isAnyPosixProcessPidDirAlive(ServerUtils.java:1216)
>  ~[storm-server-2.6.1.jar:2.6.1]
>       at 
> org.apache.storm.utils.ServerUtils.areAllProcessesDead(ServerUtils.java:1178) 
> ~[storm-server-2.6.1.jar:2.6.1]
>       at 
> org.apache.storm.container.DefaultResourceIsolationManager.areAllProcessesDead(DefaultResourceIsolationManager.java:146)
>  ~[storm-server-2.6.1.jar:2.6.1]
>       at 
> org.apache.storm.daemon.supervisor.Container.areAllProcessesDead(Container.java:248)
>  ~[storm-server-2.6.1.jar:2.6.1]
>       at 
> org.apache.storm.daemon.supervisor.Slot.killContainerFor(Slot.java:237) 
> ~[storm-server-2.6.1.jar:2.6.1]
>       at org.apache.storm.daemon.supervisor.Slot.handleRunning(Slot.java:792) 
> ~[storm-server-2.6.1.jar:2.6.1]
>       at 
> org.apache.storm.daemon.supervisor.Slot.stateMachineStep(Slot.java:184) 
> ~[storm-server-2.6.1.jar:2.6.1]
>       at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:1051) 
> [storm-server-2.6.1.jar:2.6.1]
>  
> which in turn means that the supervisor process dies, and the pod is 
> restarted.
>  
> In looking at the Storm source code I think that the issue is in 
> storm-server/src/main/java/org/apache/storm/utils/ServerUtils.javawhere it 
> has the following code:
>  
>        if (user != null && !user.isEmpty()) {
>             cmdArgs.add(user);
>         }
>  
> which results in the following command being executed:
>  
> id -u ?
>  
> since with the securityContext specified above there is not a named user 
> associated with the UID of 1000620005 and a username is not available.
>  
> I can see the following in worker.yaml for the topology:
>  
> bash-4.2$ cat worker.yaml
> worker-id: 145eac49-838f-4796-bd77-c3c99e202e32
> logs.users: []
> logs.groups: []
> topology.submitter.user: '?'
>  
> The id -u ? command outputs:
>  
> bash-4.2$ id -u ?
> id: ?: no such user
>  
> this then causes the Null Pointer Exception since it can't parse the output.
>  
> I am running with a patch locally that detects whether the username is '?' 
> and doesn't add the user to the command line.  This appears to work:
>  
>         if (user != null && !user.isEmpty() && !user.equals("?")) {
>             cmdArgs.add(user);
>         }
>  
> Is there a different technique that would work in this scenario, or does it 
> require a code change in the storm-server to resolve the issue?
>  
> Thanks,
>  
> Steve

Reply via email to