Re: [slurm-users] Queue size, slow/unresponsive head node

2018-01-12 Thread Colas Rivière
Nicholas, Why do you have? SchedulerParameters = (null) I did not set these parameters, so I assume "(null)" means all the default values are used. John, thanks, I'll try that, and look into these SchedulerParameter more. Cheers, Colas On 2018-01-12 09:08, John DeSantis wrote: Colas,

Re: [slurm-users] Queue size, slow/unresponsive head node

2018-01-12 Thread John DeSantis
Colas, We had a similar experience a long time ago, and we solved it by adding the following SchedulerParameters: max_rpc_cnt=150,defer HTH, John DeSantis On Thu, 11 Jan 2018 16:39:43 -0500 Colas Rivière wrote: > Hello, > > I'm managing a small cluster (one head node, 24 workers, 1160 total

Re: [slurm-users] Queue size, slow/unresponsive head node

2018-01-11 Thread Nicholas C Santucci
Why do you have? SchedulerParameters = (null) Is that even allowed ​?​ https://slurm.schedmd.com/sched_config.html On Thu, Jan 11, 2018 at 1:39 PM, Colas Rivière wrote: > Hello, > > I'm managing a small cluster (one head node, 24 workers, 1160 total worker > threads). The head node has t

[slurm-users] Queue size, slow/unresponsive head node

2018-01-11 Thread Colas Rivière
Hello, I'm managing a small cluster (one head node, 24 workers, 1160 total worker threads). The head node has two E5-2680 v3 CPUs (hyper-threaded), ~100 GB of memory and spinning disks. The head node becomes occasionally less responsive when there are more than 10k jobs in queue, and becomes r