Re: [slurm-users] Socket timed out on send/recv operation

2018-10-20 Thread Chris Samuel
On Friday, 19 October 2018 4:58:37 AM AEDT Kirk Main wrote: > I'm a new administrator to Slurm and I've just got my new cluster up and > running. We started getting a lot of "Socket timed out on send/recv > operation" errors when submitting jobs, and also if you try to "squeue" > while others are

Re: [slurm-users] Can frequent hold-release adversely affect slurm?

2018-10-20 Thread Chris Samuel
On Friday, 19 October 2018 3:59:17 AM AEDT Daniel Letai wrote: > Do you have any recommendations, or might suggest a better approach to solve > this problem? Not sure it will help, but you can specify: bf_max_job_array_resv=# to tell Slurm how many array elements to do forward reservations for.

Re: [slurm-users] requesting entire vs. partial nodes

2018-10-20 Thread Chris Samuel
On Saturday, 20 October 2018 9:57:16 AM AEDT Noam Bernstein wrote: > If not, is there another way to do this? You can use --exclusive for jobs that want whole nodes. You will likely also want to use: SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE to ensure jobs are given one core (wi