Hi, I am new to slurm and want to use weight option to schedule the jobs. I have some machine with same hardware configuration with GPU cards. I use QoS to force user at least required 1 gpu gres when submitting jobs. The machine serve multiple partition. What I want is consume dedicated nodes first when schedule gpu_2h parition jobs by adding weight settings.(e.g. schedule to GPU38/39 rather than 36/37). However, the scheduler turns out not following the weight settings and schedule to 36/37 (e.g. srun -p gpu_2h). All the GPU node are idle and the billing are same, did I miss something? Was it some limitation if a nodes server multiple partition or consume GRES? Please advise. Thank you very much.
Below are the setting which may help. slurm.conf NodeName=gpu[36-37] Gres=gpu:titanxp:4 ThreadsPerCore=2 State=unknown Sockets=2 CPUs=40 CoresPerSocket=10 Weight=20 NodeName=gpu[38-39] Gres=gpu:titanxp:4 ThreadsPerCore=2 State=unknown Sockets=2 CPUs=40 CoresPerSocket=10 Weight=1 PartitionName=gpu_2h Nodes=gpu[36-39] Default=YES MaxTime=02:00:00 DefaultTime=02:00:00 MaxNodes=1 State=UP AllowQos=GPU PartitionName=gpu_8h Nodes=gpu[31-37] MaxTime=08:00:00 DefaultTime=08:00:00 MaxNodes=1 State=UP AllowQos=GPU # sinfo -N -O nodelist,partition,gres,weight NODELIST PARTITION GRES WEIGHT gpu36 gpu_2h* gpu:titanxp:4 20 gpu36 gpu_8h gpu:titanxp:4 20 gpu37 gpu_2h* gpu:titanxp:4 20 gpu37 gpu_8h gpu:titanxp:4 20 gpu38 gpu_2h* gpu:titanxp:4 1 gpu39 gpu_2h* gpu:titanxp:4 1