Sometimes restarting slurm on the node and the master can purge the jobs as well.

-Paul Edmon-


On 04/10/2017 03:59 PM, Douglas Meyer wrote:
Set node to drain if other jobs running.  Then down and then resume.  Down will 
kill and clear any jobs.

scontrol update nodename=xxxxxxxx state=drain reason=job_sux

scontrol update nodename=xxxxxxxx state=down reason=job_sux
scontrol update nodename=xxxxxxxx state=resume

If it happens again either reboot or stop and restart slurm.  Make sure you 
verify it has stopped.

Doug

-----Original Message-----
From: Gene Soudlenkov [mailto:g.soudlen...@auckland.ac.nz]
Sent: Monday, April 10, 2017 12:56 PM
To: slurm-dev <slurm-dev@schedmd.com>
Subject: [slurm-dev] Re: Deleting jobs in Completing state on hung nodes


It happens sometimes - in our case epilogue code got stuck. Either check the 
processes and kill whicehver ones belong to the user or simply reboot the nodes.

Cheers,
Gene

--
New Zealand eScience Infrastructure
Centre for eResearch
The University of Auckland
e: g.soudlen...@auckland.ac.nz
p: +64 9 3737599 ext 89834 c: +64 21 840 825 f: +64 9 373 7453
w: www.nesi.org.nz

On 11/04/17 07:52, Tus wrote:
I have 2 nodes that have hardware issues and died with jobs running on
them. I am not able to fix the nodes at the moment but want to delete
the jobs that are stuck in completing state from slurm. I have set the
nodes to DRAIN and tried scancel which did not work.

How do I remove these jobs?


Reply via email to