Slurm Users,

I am hoping that you all can help me with the problem below.

We just spun up a new cluster using Bright and have been trying to change the 
default behavior of slurm from linear to con_res.  Should be simple enough but 
I am plagued by the following error:

error: we don't have select plugin type 102

Both the select_linear.so and select_cons_res.so are located in 
/cm/shared_tmp/apps/slurm/17.11.8/lib64/slurm/

I have been testing with just the compute nodes and not the GPU nodes etc...  I 
added the following to my slurm.conf file:

# Scheduler
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core

# Nodes
# NodeName=big-mem[001-005],node[001-056]   # Entry from default install
# NodeName=gpu[001-004]  Gres=gpu:2   # Entry from default install
NodeName=node[001-056] CPUs=2 RealMemory=196000 Sockets=2 CoresPerSocket=20 
ThreadsPerCore=1 State=UNKNOWN


# Partitions
PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 
PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=YES GraceTime=0 
PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO 
ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP 
Nodes=gpu[001-004],big-mem[001-005],node[001-056]
PartitionName=test Default=NO MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 
PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=YES GraceTime=0 
PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO 
ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-056]

When I issue the scontrol reconfigure I get the following:

[root@thunder ~]# scontrol reconfigure
slurm_reconfigure error: Unable to contact slurm controller (connect failure)
[root@thunder ~]# systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; disabled; vendor 
preset: disabled)
   Active: failed (Result: exit-code) since Thu 2018-12-13 08:46:18 CST; 5s ago
  Process: 31416 ExecStart=/cm/shared/apps/slurm/17.11.8/sbin/slurmctld 
$SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 31418 (code=exited, status=1/FAILURE)

When I revert the changes, it goes back to an active working state.

The /var/log/slurmctld log shows this erorr message:

error: we don't have select plugin type 102

Has anyone else run into this problem?  If so, can you recommend a fix?

Thanks,

Chad

Reply via email to