Hi, I think we should fix that and just ensure, that this command cannot fail for a non existing user. I wouldn’t check for „?“, though. Maybe we can just define a new property to explicitly skip those uid checks (similar as it is done for Windows).
The patch might work because „id -u“ will just output the id of the user running the java process. Overall, I guess, that we are open for a PR with a test reproducer :) Gruß Richard > Am 14.05.2024 um 16:40 schrieb Stephen Clark via user <[email protected]>: > > Hi, > > I am running the Storm Supervisor in an image that I've created in Kubernetes > using a securityContext that has the following: > > securityContext: > runAsUser: 1000620005 > fsGroup: 1000620005 > supplementalGroups: [ 64000 ] > > The UID 1000620005 is not related to a user specified in the /etc/passwd file > in the Docker image. > > When I kill a topology, this generates the following exception: > > 2024-05-10 06:26:10.661 [SLOT_6700] itemId= { jobName="" ,jobTemplateId="" > ,userOrAppId="" ,tenantId="", jobStep="", scaleCopyJobId=""} ERROR > apache.storm.daemon.supervisor.Slot - Error when processing event > java.lang.NullPointerException: null > at org.apache.storm.utils.ServerUtils.getUserId(ServerUtils.java:1095) > ~[storm-server-2.6.1.jar:2.6.1] > at > org.apache.storm.utils.ServerUtils.isAnyPosixProcessPidDirAlive(ServerUtils.java:1284) > ~[storm-server-2.6.1.jar:2.6.1] > at > org.apache.storm.utils.ServerUtils.isAnyPosixProcessPidDirAlive(ServerUtils.java:1216) > ~[storm-server-2.6.1.jar:2.6.1] > at > org.apache.storm.utils.ServerUtils.areAllProcessesDead(ServerUtils.java:1178) > ~[storm-server-2.6.1.jar:2.6.1] > at > org.apache.storm.container.DefaultResourceIsolationManager.areAllProcessesDead(DefaultResourceIsolationManager.java:146) > ~[storm-server-2.6.1.jar:2.6.1] > at > org.apache.storm.daemon.supervisor.Container.areAllProcessesDead(Container.java:248) > ~[storm-server-2.6.1.jar:2.6.1] > at > org.apache.storm.daemon.supervisor.Slot.killContainerFor(Slot.java:237) > ~[storm-server-2.6.1.jar:2.6.1] > at org.apache.storm.daemon.supervisor.Slot.handleRunning(Slot.java:792) > ~[storm-server-2.6.1.jar:2.6.1] > at > org.apache.storm.daemon.supervisor.Slot.stateMachineStep(Slot.java:184) > ~[storm-server-2.6.1.jar:2.6.1] > at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:1051) > [storm-server-2.6.1.jar:2.6.1] > > which in turn means that the supervisor process dies, and the pod is > restarted. > > In looking at the Storm source code I think that the issue is in > storm-server/src/main/java/org/apache/storm/utils/ServerUtils.javawhere it > has the following code: > > if (user != null && !user.isEmpty()) { > cmdArgs.add(user); > } > > which results in the following command being executed: > > id -u ? > > since with the securityContext specified above there is not a named user > associated with the UID of 1000620005 and a username is not available. > > I can see the following in worker.yaml for the topology: > > bash-4.2$ cat worker.yaml > worker-id: 145eac49-838f-4796-bd77-c3c99e202e32 > logs.users: [] > logs.groups: [] > topology.submitter.user: '?' > > The id -u ? command outputs: > > bash-4.2$ id -u ? > id: ?: no such user > > this then causes the Null Pointer Exception since it can't parse the output. > > I am running with a patch locally that detects whether the username is '?' > and doesn't add the user to the command line. This appears to work: > > if (user != null && !user.isEmpty() && !user.equals("?")) { > cmdArgs.add(user); > } > > Is there a different technique that would work in this scenario, or does it > require a code change in the storm-server to resolve the issue? > > Thanks, > > Steve
