On Friday, 19 October 2018 4:58:37 AM AEDT Kirk Main wrote:
> I'm a new administrator to Slurm and I've just got my new cluster up and
> running. We started getting a lot of "Socket timed out on send/recv
> operation" errors when submitting jobs, and also if you try to "squeue"
> while others are
On Friday, 19 October 2018 3:59:17 AM AEDT Daniel Letai wrote:
> Do you have any recommendations, or might suggest a better approach to solve
> this problem?
Not sure it will help, but you can specify:
bf_max_job_array_resv=#
to tell Slurm how many array elements to do forward reservations for.
On Saturday, 20 October 2018 9:57:16 AM AEDT Noam Bernstein wrote:
> If not, is there another way to do this?
You can use --exclusive for jobs that want whole nodes.
You will likely also want to use:
SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE
to ensure jobs are given one core (wi