On 11/23/23 11:50, Markus Kötter wrote:
On 23.11.23 10:56, Schneider, Gerald wrote:
I have a recurring problem with allocated TRES, which are not
released after all jobs on that node are finished. The TRES are still
marked as allocated and no new jobs can't be scheduled on that node
using those TRES.

Remove the node from slurm.conf and restart slurmctld, re-add, restart.
Remove from Partition definitions as well.

Just my 2 cents:  Do NOT remove a node from slurm.conf just as described!

When adding or removing nodes, both slurmctld as well as all slurmd's must be restarted! See the SchedMD presentation https://slurm.schedmd.com/SLUG23/Field-Notes-7.pdf slides 51-56 for the recommended procedure.

/Ole

Reply via email to