,
Fokke
Op wo 24 jan 2024 om 16:19 schreef Fokke Dijkstra :
> Dear Brian,
>
> Thanks for the hints, I think you are correctly pointing at some network
> connection issue. I've disabled firewalld on the control host, but that
> unfortunately did not help. The processes stuck i
completed epilog for jobid 3679888
> [2024-01-28T17:33:58.774] debug: JobId=3679888: sent epilog complete msg:
> rc = 0
>
>
> -- Paul Raines (http://help.nmr.mgh.harvard.edu)
>
>
>
> Please note that this e-mail is not secure (encrypted). If you do not
> wish to c
(though the firewall
> is not between those two layers).
>
> --
> Brian D. Haymore
> University of Utah
> Center for High Performance Computing
> 155 South 1452 East RM 405
> Salt Lake City, Ut 84112
> Phone: 801-558-1150
> http://bit.ly/1HO1N2C
> ---
. This
leads to many job failures.
The issue appears to be somewhat similar to the one described at:
https://bugs.schedmd.com/show_bug.cgi?id=18561
In that case the site downgraded the slurmd clients to 22.05 which got rid
of the problems.
We’ve now downgraded the slurmd on the compute nodes to 23.02