Hi,
I am running the Storm Supervisor in an image that I've created in Kubernetes
using a securityContext that has the following:
securityContext:
runAsUser: 1000620005
fsGroup: 1000620005
supplementalGroups: [ 64000 ]
The UID 1000620005 is not related to a user specified in the /etc/passwd file
in the Docker image.
When I kill a topology, this generates the following exception:
2024-05-10 06:26:10.661 [SLOT_6700] itemId= { jobName="" ,jobTemplateId=""
,userOrAppId="" ,tenantId="", jobStep="", scaleCopyJobId=""} ERROR
apache.storm.daemon.supervisor.Slot - Error when processing event
java.lang.NullPointerException: null
at org.apache.storm.utils.ServerUtils.getUserId(ServerUtils.java:1095)
~[storm-server-2.6.1.jar:2.6.1]
at
org.apache.storm.utils.ServerUtils.isAnyPosixProcessPidDirAlive(ServerUtils.java:1284)
~[storm-server-2.6.1.jar:2.6.1]
at
org.apache.storm.utils.ServerUtils.isAnyPosixProcessPidDirAlive(ServerUtils.java:1216)
~[storm-server-2.6.1.jar:2.6.1]
at
org.apache.storm.utils.ServerUtils.areAllProcessesDead(ServerUtils.java:1178)
~[storm-server-2.6.1.jar:2.6.1]
at
org.apache.storm.container.DefaultResourceIsolationManager.areAllProcessesDead(DefaultResourceIsolationManager.java:146)
~[storm-server-2.6.1.jar:2.6.1]
at
org.apache.storm.daemon.supervisor.Container.areAllProcessesDead(Container.java:248)
~[storm-server-2.6.1.jar:2.6.1]
at
org.apache.storm.daemon.supervisor.Slot.killContainerFor(Slot.java:237)
~[storm-server-2.6.1.jar:2.6.1]
at org.apache.storm.daemon.supervisor.Slot.handleRunning(Slot.java:792)
~[storm-server-2.6.1.jar:2.6.1]
at
org.apache.storm.daemon.supervisor.Slot.stateMachineStep(Slot.java:184)
~[storm-server-2.6.1.jar:2.6.1]
at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:1051)
[storm-server-2.6.1.jar:2.6.1]
which in turn means that the supervisor process dies, and the pod is restarted.
In looking at the Storm source code I think that the issue is in
storm-server/src/main/java/org/apache/storm/utils/ServerUtils.java where it has
the following code:
if (user != null && !user.isEmpty()) {
cmdArgs.add(user);
}
which results in the following command being executed:
id -u ?
since with the securityContext specified above there is not a named user
associated with the UID of 1000620005 and a username is not available.
I can see the following in worker.yaml for the topology:
bash-4.2$ cat worker.yaml
worker-id: 145eac49-838f-4796-bd77-c3c99e202e32
logs.users: []
logs.groups: []
topology.submitter.user: '?'
The id -u ? command outputs:
bash-4.2$ id -u ?
id: ?: no such user
this then causes the Null Pointer Exception since it can't parse the output.
I am running with a patch locally that detects whether the username is '?' and
doesn't add the user to the command line. This appears to work:
if (user != null && !user.isEmpty() && !user.equals("?")) {
cmdArgs.add(user);
}
Is there a different technique that would work in this scenario, or does it
require a code change in the storm-server to resolve the issue?
Thanks,
Steve