[slurm-users] Socket timed out - tuning

2024-01-29 Thread Reed Dier
Hoping someone can help point me towards some tweaks to help prevent denial-of-service issues. > sbatch: error: Batch job submission failed: Socket timed out on send/recv > operation Root cause is understood, issues with shared storage for the slurmctld’s was impacted, leading to an increase in

[slurm-users] Why is Slurm 20 the latest RPM in RHEL 8/Fedora repo?

2024-01-29 Thread Robert Kudyba
According to these links: https://rpmfind.net/linux/rpm2html/search.php?query=slurm https://src.fedoraproject.org/rpms/slurm Why doesn't RHEL 8 get a newer version? Can someone update the repo maintainer Philip Kovacs < pk...@fedoraproject.org>? There was

Re: [slurm-users] Two jobs each with a different partition running on same node?

2024-01-29 Thread Paul Edmon
That certainly isn't the case in our configuration. We have multiple overlapping partitions and our nodes have a mix of jobs from all different partitions.  So the default behavior is to have a mixing of partitions on a node governed by the Priority Tier of the partition. Namely the highest pri

[slurm-users] Two jobs each with a different partition running on same node?

2024-01-29 Thread Loris Bennett
Hi, I seem to remember that in the past, if a node was configured to be in two partitions, the actual partition of the node was determined by the partition associated with the jobs running on it. Moreover, at any instance where the node was running one or more jobs, the node could only actually b