We have Weka filesystems on one of our clusters and saw this; we discovered we 
had slightly misconfigured the weka client and the result was that Weka’s and 
SLURMs cgroups were fighting with each other, and this seemed to be the result. 
 Fixing the weka cgroups config improved the problem, for us.  I haven’t heard 
anyone complain about it since.

Tim

--
Tim Cutts
Scientific Computing Platform Lead
AstraZeneca

Find out more about R&D IT Data, Analytics & AI and how we can support you by 
visiting our Service 
Catalogue<https://azcollaboration.sharepoint.com/sites/CMU993> |


From: Paul Edmon via slurm-users <slurm-users@lists.schedmd.com>
Date: Wednesday, 10 April 2024 at 14:46
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Re: Jobs of a user are stuck in Completing stage for a 
long time and cannot cancel them
Usually to clear jobs like this you have to reboot the node they are on.
That will then force the scheduler to clear them.

-Paul Edmon-

On 4/10/2024 2:56 AM, archisman.pathak--- via slurm-users wrote:
> We are running a slurm cluster with version `slurm 22.05.8`. One of our users 
> has reported that their jobs have been stuck at the completion stage for a 
> long time. Referring to Slurm Workload Manager - Slurm Troubleshooting Guide 
> we found that indeed the batchhost for the job was removed from the cluster, 
> perhaps without draining it first.
>
> How do we cancel/delete the jobs ?
>
> * We tried scancel on the batch and individual job ids from both the user and 
> from SlurmUser
>

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
________________________________

AstraZeneca UK Limited is a company incorporated in England and Wales with 
registered number:03674842 and its registered office at 1 Francis Crick Avenue, 
Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only 
and may contain confidential and privileged information. If they have come to 
you in error, you must not copy or show them to anyone; instead, please reply 
to this e-mail, highlighting the error to the sender and then immediately 
delete the message. For information about how AstraZeneca UK Limited and its 
affiliates may process information, personal data and monitor communications, 
please see our privacy notice at 
www.astrazeneca.com<https://www.astrazeneca.com>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to