Hi Olaf,

Check the firewalls between your compute node and the Slurm controller to
make sure that they can contact each other. Slurmctld needs to contact the
SlurmdPort (default 6818), and slurmd needs to contact the SlurmctldPort
(default 6817). Also the other compute nodes need to be able to contact the
new compute node on SlurmdPort.


Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia

On Wed, 16 Dec 2020 at 03:48, Olaf Gellert <gell...@dkrz.de> wrote:

> UoM notice: External email. Be cautious of links, attachments, or
> impersonation attempts
> Hi all,
> we are setting up a new test cluster to test some features for our
> next HPC system. On one of the compute nodes we get these messages
> in the log:
> [2020-12-15T10:00:21.753] error: Munge decode failed: Invalid credential
> [2020-12-15T10:00:21.753] auth/munge: _print_cred: ENCODED: Thu Jan 01
> 01:00:00 1970
> [2020-12-15T10:00:21.753] auth/munge: _print_cred: DECODED: Thu Jan 01
> 01:00:00 1970
> [2020-12-15T10:00:21.753] error: slurm_receive_msg_and_forward:
> g_slurm_auth_verify: REQUEST_NODE_REGISTRATION_STATUS has authentication
> error: Invalid authentication credential
> [2020-12-15T10:00:21.753] error: slurm_receive_msg_and_forward: Protocol
> authentication error
> [2020-12-15T10:00:21.763] error: service_connection: slurm_receive_msg:
> Protocol authentication error
> I checked munge authentication in the usual way, so:
> - time between nodes is synchronised
> - munge is using same UID/GID on both sides
> - "munge -c0 -z0 -n | unmunge" works on compute nodes and on slurmctld
>    node
> - ssh slurmcontrolnode "munge -c0 -z0 -n" | unmunge on a compute node
>    works
> - ssh computenode "munge -c0 -z0 -n" | unmunge on the slurmctld node
>    works
> So munge seems to work as far as I can say. What else does
> slurm using munge? Are hostnames part of the authentication?
> Do I have to wonder about the time "Thu Jan 01 01:00:00 1970"
> (in the logs above)?
> All machines are CentOS8, slurm is self-built 20.11.0,
> munge is from CentOS8 rpm:
> munge-0.5.13-1.el8.x86_64
> munge-libs-0.5.13-1.el8.x86_64
> Cheers, Olaf
> --
> Dipl. Inform. Olaf Gellert            email  gell...@dkrz.de
> Deutsches Klimarechenzentrum GmbH     phone  +49 (0)40 460094 214
> Bundesstrasse 45a                     fax    +49 (0)40 460094 270
> D-20146 Hamburg, Germany              www    http://www.dkrz.de
> Sitz der Gesellschaft: Hamburg
> Geschäftsführer: Prof. Dr. Thomas Ludwig
> Registergericht: Amtsgericht Hamburg, HRB 39784

Reply via email to