Sounds good to me.
> Am 03.06.2024 um 16:26 schrieb Stephen Clark via user <[email protected]>:
>
> Thanks Richard.
>
> I’ve come up with what I think is a better implementation for the PR based on
> our use case:
>
> if (user != null && !user.isEmpty()) {
> cmdArgs.add(user);
> int exitCode = 0;
> try {
> exitCode = new ProcessBuilder(cmdArgs).start().waitFor();
> } catch (Exception e) {
> // Ignore
> } finally {
> if (exitCode != 0) {
> LOG.debug("CMD: '{}' returned exit code of {}",
> String.join(" ", cmdArgs), exitCode);
> cmdArgs.remove(user);
> }
> }
> }
>
> The code will first attempt to run the id -u command with a user specified.
> If this is unsuccessful, then it removes the user from the command-line.
>
> In our scenario we do ultimately need to run the id -u command without a user
> specified to output the id of the user running the java process.
>
> Should I go ahead and create a PR for this code change? I’m happy to act as
> a test reproducer too.
>
> Thanks,
>
> Steve
>
> From: Richard Zowalla <[email protected] <mailto:[email protected]>>
> Sent: Wednesday, May 15, 2024 3:51 PM
> To: [email protected] <mailto:[email protected]>; Stephen Clark
> <[email protected] <mailto:[email protected]>>
> Subject: [EXTERNAL] Re: Issue with Worker Termination in Kubernetes
>
> CAUTION: This email originated from outside the organization. Do not click
> links or open attachments unless you recognize the sender and know the
> content is safe.
>
> Hi,
>
> I think we should fix that and just ensure, that this command cannot fail for
> a non existing user.
> I wouldn’t check for „?“, though. Maybe we can just define a new property to
> explicitly skip those uid checks (similar as it is done for Windows).
>
> The patch might work because „id -u“ will just output the id of the user
> running the java process.
>
> Overall, I guess, that we are open for a PR with a test reproducer :)
>
> Gruß
> Richard
>
>
> Am 14.05.2024 um 16:40 schrieb Stephen Clark via user <[email protected]
> <mailto:[email protected]>>:
>
> Hi,
>
> I am running the Storm Supervisor in an image that I've created in Kubernetes
> using a securityContext that has the following:
>
> securityContext:
> runAsUser: 1000620005
> fsGroup: 1000620005
> supplementalGroups: [ 64000 ]
>
> The UID 1000620005 is not related to a user specified in the /etc/passwd file
> in the Docker image.
>
> When I kill a topology, this generates the following exception:
>
> 2024-05-10 06:26:10.661 [SLOT_6700] itemId= { jobName="" ,jobTemplateId=""
> ,userOrAppId="" ,tenantId="", jobStep="", scaleCopyJobId=""} ERROR
> apache.storm.daemon.supervisor.Slot - Error when processing event
> java.lang.NullPointerException: null
> at org.apache.storm.utils.ServerUtils.getUserId(ServerUtils.java:1095)
> ~[storm-server-2.6.1.jar:2.6.1]
> at
> org.apache.storm.utils.ServerUtils.isAnyPosixProcessPidDirAlive(ServerUtils.java:1284)
> ~[storm-server-2.6.1.jar:2.6.1]
> at
> org.apache.storm.utils.ServerUtils.isAnyPosixProcessPidDirAlive(ServerUtils.java:1216)
> ~[storm-server-2.6.1.jar:2.6.1]
> at
> org.apache.storm.utils.ServerUtils.areAllProcessesDead(ServerUtils.java:1178)
> ~[storm-server-2.6.1.jar:2.6.1]
> at
> org.apache.storm.container.DefaultResourceIsolationManager.areAllProcessesDead(DefaultResourceIsolationManager.java:146)
> ~[storm-server-2.6.1.jar:2.6.1]
> at
> org.apache.storm.daemon.supervisor.Container.areAllProcessesDead(Container.java:248)
> ~[storm-server-2.6.1.jar:2.6.1]
> at
> org.apache.storm.daemon.supervisor.Slot.killContainerFor(Slot.java:237)
> ~[storm-server-2.6.1.jar:2.6.1]
> at org.apache.storm.daemon.supervisor.Slot.handleRunning(Slot.java:792)
> ~[storm-server-2.6.1.jar:2.6.1]
> at
> org.apache.storm.daemon.supervisor.Slot.stateMachineStep(Slot.java:184)
> ~[storm-server-2.6.1.jar:2.6.1]
> at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:1051)
> [storm-server-2.6.1.jar:2.6.1]
>
> which in turn means that the supervisor process dies, and the pod is
> restarted.
>
> In looking at the Storm source code I think that the issue is in
> storm-server/src/main/java/org/apache/storm/utils/ServerUtils.javawhere it
> has the following code:
>
> if (user != null && !user.isEmpty()) {
> cmdArgs.add(user);
> }
>
> which results in the following command being executed:
>
> id -u ?
>
> since with the securityContext specified above there is not a named user
> associated with the UID of 1000620005 and a username is not available.
>
> I can see the following in worker.yaml for the topology:
>
> bash-4.2$ cat worker.yaml
> worker-id: 145eac49-838f-4796-bd77-c3c99e202e32
> logs.users: []
> logs.groups: []
> topology.submitter.user: '?'
>
> The id -u ? command outputs:
>
> bash-4.2$ id -u ?
> id: ?: no such user
>
> this then causes the Null Pointer Exception since it can't parse the output.
>
> I am running with a patch locally that detects whether the username is '?'
> and doesn't add the user to the command line. This appears to work:
>
> if (user != null && !user.isEmpty() && !user.equals("?")) {
> cmdArgs.add(user);
> }
>
> Is there a different technique that would work in this scenario, or does it
> require a code change in the storm-server to resolve the issue?
>
> Thanks,
>
> Steve