Hello,

Thank you for your response to my email. I've taken a look at one of the 
compute nodes that has been drained by the SLURM system -- please see below. If 
appears to suggest the node was drained due to a job failing (running out of 
walltime perhaps?). 

This is very odd since I don't have anything the epilog script that takes nodes 
out of service for any reason (at least not explicitly).

Does anyone have any ideas why nodes are draining following "job failures"?

Best regards,
David

[root@blue30 etc]# scontrol show node red0038
NodeName=red0038 Arch=x86_64 CoresPerSocket=6
   CPUAlloc=0 CPUErr=0 CPUTot=12 CPULoad=0.00
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=red0038 NodeHostName=red0038 Version=17.02
   OS=Linux RealMemory=1 AllocMem=0 FreeMem=22050 Sockets=2 Boards=1
   State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=batch 
   BootTime=2017-04-07T09:31:26 SlurmdStartTime=2017-05-04T14:50:06
   CfgTRES=cpu=12,mem=1M
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
   Reason=batch job complete failure [root@2017-05-22T13:10:13]


________________________________________
From: Christopher Samuel [sam...@unimelb.edu.au]
Sent: 24 May 2017 00:23
To: slurm-dev
Subject: [slurm-dev] Re: Compute nodes going to drained/draining state

On 22/05/17 19:57, Baker D.J. wrote:

> I’ve recently started using slurm v17.02.2, however something seems very
> odd. For some reason, when for example jobs fail or exceed their
> walltime limit, I see that compute nodes are being placed in drained or
> draining state. Does anyone understand what might be wrong?

Anything setting a drain state is meant to also set a reason, what does
"scontrol show node $NODE" say for these?

Also are there any relevant messages in your slurmctld and slurmd logs?

Best of luck,
Chris
--
 Christopher Samuel        Senior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545

Reply via email to