[slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-12 Thread Alessandro Federico
Hi all, we are setting up SLURM 17.11.2 on a small test cluster of about 100 nodes. Sometimes we get the error in the subject when running any SLURM command (e.g. sinfo, squeue, scontrol reconf, etc...) Do we have to apply any particular setting to avoid incurring the problem? We found t

Re: [slurm-users] Queue size, slow/unresponsive head node

2018-01-12 Thread John DeSantis
Colas, We had a similar experience a long time ago, and we solved it by adding the following SchedulerParameters: max_rpc_cnt=150,defer HTH, John DeSantis On Thu, 11 Jan 2018 16:39:43 -0500 Colas Rivière wrote: > Hello, > > I'm managing a small cluster (one head node, 24 workers, 1160 total

[slurm-users] restrict application to a given partition

2018-01-12 Thread Juan A. Cordero Varelaq
Dear Community, I have a node (20 Cores) on my HPC with two different partitions: big (16 cores) and small (4 cores). I have installed software X on this node, but I want only one partition to have rights to run it. Is it then possible to restrict the execution of an specific application to a

Re: [slurm-users] restrict application to a given partition

2018-01-12 Thread Paul Edmon
You could do this using a job_submit.lua script that inspects for that application and routes them properly. -Paul Edmon- On 01/12/2018 11:31 AM, Juan A. Cordero Varelaq wrote: Dear Community, I have a node (20 Cores) on my HPC with two different partitions: big (16 cores) and small (4 core

Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

2018-01-12 Thread John DeSantis
Ciao Alessandro, > Do we have to apply any particular setting to avoid incurring the > problem? What is your "MessageTimeout" value in slurm.conf? If it's at the default of 10, try changing it to 20. I'd also check and see if the slurmctld log is reporting anything pertaining to the server thr

[slurm-users] reserve gpus?

2018-01-12 Thread Robert Bjornson
Hi all, I am trying to figure out how to create a reservation that includes reserving gpus. We normally request them using something like --gres=gpu:2 I looked through the documentation for reservations and scontrol and don't see any reference to reserving gres's. The scontrol documentation se

Re: [slurm-users] Queue size, slow/unresponsive head node

2018-01-12 Thread Colas Rivière
Nicholas, Why do you have? SchedulerParameters = (null) I did not set these parameters, so I assume "(null)" means all the default values are used. John, thanks, I'll try that, and look into these SchedulerParameter more. Cheers, Colas On 2018-01-12 09:08, John DeSantis wrote: Colas,

[slurm-users] Slurm 17.11 X11 support questions

2018-01-12 Thread Mjelde, Matthew J
Howdy, I am trying to get my cluster's native X11 support enabled, but am having some difficulty. I generated the RPMs for slurm, installed them on my systems, and added the "PrologFlags=X11" in the slurm.conf file as well. However, when I start up slurmd with the flag added, I get this error