After you restart slurmctld do "scontrol reconfigure"
Brian Andrus
On 8/30/2019 6:57 AM, Robert Kudyba wrote:
I had set RealMemory to a really high number as I mis-interpreted the
recommendation.
NodeName=node[001-003] CoresPerSocket=12 RealMemory=
196489092 Sockets=2 Gres=gpu:1
But now I
I had set RealMemory to a really high number as I mis-interpreted the
recommendation.
NodeName=node[001-003] CoresPerSocket=12 RealMemory= 196489092 Sockets=2
Gres=gpu:1
But now I set it to:
RealMemory=191000
I restarted slurmctld. And according to the Bright Cluster support team:
"Unless it ha
Sounds like maybe you didn't correctly roll out / update your slurm.conf
everywhere as your RealMemory value is back to your large wrong number.
You need to update your slurm.conf everywhere and restart all the slurm
daemons.
I recommend the "safe procedure" from here:
https://wiki.fysik.dtu.dk/ni
I thought I had taken care of this a while back but it appears the issue has
returned. A very simply sbatch slurmhello.sh:
cat slurmhello.sh
#!/bin/sh
#SBATCH -o my.stdout
#SBATCH -N 3
#SBATCH --ntasks=16
module add shared openmpi/gcc/64/1.10.7 slurm
mpirun hello
sbatch slurmhello.sh
Submitted b
Thanks Brian indeed we did have it set in bytes. I set it to the MB value.
Hoping this takes care of the situation.
> On Jul 8, 2019, at 4:02 PM, Brian Andrus wrote:
>
> Your problem here is that the configuration for the nodes in question have an
> incorrect amount of memory set for them. Loo
Your problem here is that the configuration for the nodes in question
have an incorrect amount of memory set for them. Looks like you have it
set in bytes instead of megabytes
In your slurm.conf you should look at the RealMemory setting:
*RealMemory*
Size of real memory on the node in megab
I’m new to Slurm and we have a 3 node + head node cluster running Centos 7 and
Bright Cluster 8.1. Their support sent me here as they say Slurm is configured
optimally to allow multiple tasks to run. However at times a job will hold up
new jobs. Are there any other logs I can look at and/or sett