Hi,
It seems that my *MaxMemPerCpu *is not working as I would have expected
(increase cpu if mem or mem-per-cpu exceed that limit)
Here is my partition definition
$ scontrol show part short
PartitionName=short
AllowGroups=ALL DenyAccounts=data AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/
Hi all;
I setup a Slurm based cluster to study scheduling algorithms. This cluster
has 10 nodes and 2 CPUs per node. I compiled Slurm with the
"--enable-multiple-slurmd" option and I configured it to act as a 3600
node cluster with 16 CPUs per node. Since I am only submitting sleep jobs,
this co
If you decide to go the single partition model, you can use the
"Weight" parameter in slurm.conf to cause the standard nodes
to be preferentially used to the high-mem and GPU nodes. So jobs
only end up on high-mem or GPU nodes if they requested a lot of
memory or a GPU, or if the cluster is ver
Ah, success. It was gres related. I verified the slurm.conf's are the
same, but I never verified the gres.conf. It looks like our production
gres.conf had been copied to the backup controller which had the same
gres names, but different hosts associated with them. Fixing that and
restarting slurmd
Yes, head node & backup head sync to the same ntp server. Verifying by
hand they seem to be within 1 sec of each other. Here's the nodes info
it finds as it starts up in slurmd.log:
[2017-01-31T15:31:59.711] CPUs=24 Boards=1 Sockets=2 Cores=6 Threads=2
Memory=48388 TmpDisk=508671 Uptime=1147426 CP
Hi David,
Baker D.J. writes:
> Hello,
>
> This is hopefully a very simple set of questions for someone. I’m evaluating
> slurm with a view to replacing our existing torque/moab system, and I’ve been
> reading about defining partitions and QoSs. I like the idea of being able to
> use
> a QoS to
Similar to Lachlan's suggestions: check that the slurm.conf is the same on
all nodes, and in particular that the number of cpus and cores are correct.
Have you tried removing the Gres parameters? Perhaps it's looking for devices
it can't find.
Paddy
On Tue, Jan 31, 2017 at 02:08:51PM -0800, Lac
Hello,
This is hopefully a very simple set of questions for someone. I'm evaluating
slurm with a view to replacing our existing torque/moab system, and I've been
reading about defining partitions and QoSs. I like the idea of being able to
use a QoS to throttle user activity -- for example to se