There is a third user account on all machines in the cluster that is the user account for using the cluster. That account has uid 1000 on all four worker nodes, but on the controller it is 1001. So that is probably why the question marks.
I doubt this is the issue when 3 of the 4 nodes that work have the same uid mismatch for that user (nor the slurm or munge user). -----Original Message----- From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Chris Samuel Sent: Monday, April 20, 2020 12:03 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] Munge decode failing on new node On Friday, 17 April 2020 2:22:00 PM PDT Dean Schulze wrote: > Both work. The only discrepancy is that the slurm controller output > had these two lines: > > UID: ??? (1000) > GID: ??? (1000) > > Like the controller doesn't know the username for UID 1000. What does this say on the controller and the compute node? getent passwd 1000 Are you using LDAP or the like to ensure that all nodes have the same user database? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA