Thanks for the advice on this. Turns out that I was not supposed to be editing my slurm.conf outside of the Bright management tools. Once I was told that, I made the changes within the cmsh and all is well. If someone comes across this post. Email me and I will be happy to walk you through the steps I took to get there.
Chad From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Jeffrey Frey Sent: Thursday, December 13, 2018 9:17 AM To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] Help with Con_Res Plugin Error When in doubt, check the source: extern int select_g_select_nodeinfo_unpack(dynamic_plugin_data_t **nodeinfo, Buf buffer, uint16_t protocol_version) { dynamic_plugin_data_t *nodeinfo_ptr = NULL; if (slurm_select_init(0) < 0) return SLURM_ERROR; nodeinfo_ptr = xmalloc(sizeof(dynamic_plugin_data_t)); *nodeinfo = nodeinfo_ptr; if (protocol_version >= SLURM_MIN_PROTOCOL_VERSION) { int i; uint32_t plugin_id; safe_unpack32(&plugin_id, buffer); for (i=0; i<select_context_cnt; i++) if (*(ops[i].plugin_id) == plugin_id) { nodeinfo_ptr->plugin_id = i; break; } if (i >= select_context_cnt) { error("we don't have select plugin type %u",plugin_id); goto unpack_error; } } Your slurmd's probably haven't been reconfigured yet and are expecting the linear plugin when they connect to the newly-restarted slurmctld. They could probably do with a restart, assuming you've pushed-out slurm.conf changes to them. On Dec 13, 2018, at 10:10 AM, Julius, Chad <chad.jul...@sdstate.edu<mailto:chad.jul...@sdstate.edu>> wrote: As an addendum, I did try the suggestion mentioned here as well: http://kb.brightcomputing.com/faq/index.php?action=artikel&cat=14&id=410&artlang=en&highlight=slurm Chad From: slurm-users <slurm-users-boun...@lists.schedmd.com<mailto:slurm-users-boun...@lists.schedmd.com>> On Behalf Of Julius, Chad Sent: Thursday, December 13, 2018 8:54 AM To: slurm-users@lists.schedmd.com<mailto:slurm-users@lists.schedmd.com> Subject: [slurm-users] Help with Con_Res Plugin Error Slurm Users, I am hoping that you all can help me with the problem below. We just spun up a new cluster using Bright and have been trying to change the default behavior of slurm from linear to con_res. Should be simple enough but I am plagued by the following error: error: we don't have select plugin type 102 Both the select_linear.so and select_cons_res.so are located in /cm/shared_tmp/apps/slurm/17.11.8/lib64/slurm/ I have been testing with just the compute nodes and not the GPU nodes etc... I added the following to my slurm.conf file: # Scheduler SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_Core # Nodes # NodeName=big-mem[001-005],node[001-056] # Entry from default install # NodeName=gpu[001-004] Gres=gpu:2 # Entry from default install NodeName=node[001-056] CPUs=2 RealMemory=196000 Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 State=UNKNOWN # Partitions PartitionName=defq Default=YES MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=YES GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=gpu[001-004],big-mem[001-005],node[001-056] PartitionName=test Default=NO MinNodes=1 AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=YES GraceTime=0 PreemptMode=OFF ReqResv=NO AllowAccounts=ALL AllowQos=ALL LLN=NO ExclusiveUser=NO OverSubscribe=NO OverTimeLimit=0 State=UP Nodes=node[001-056] When I issue the scontrol reconfigure I get the following: [root@thunder ~]# scontrol reconfigure slurm_reconfigure error: Unable to contact slurm controller (connect failure) [root@thunder ~]# systemctl status slurmctld.service ● slurmctld.service - Slurm controller daemon Loaded: loaded (/usr/lib/systemd/system/slurmctld.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2018-12-13 08:46:18 CST; 5s ago Process: 31416 ExecStart=/cm/shared/apps/slurm/17.11.8/sbin/slurmctld $SLURMCTLD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 31418 (code=exited, status=1/FAILURE) When I revert the changes, it goes back to an active working state. The /var/log/slurmctld log shows this erorr message: error: we don't have select plugin type 102 Has anyone else run into this problem? If so, can you recommend a fix? Thanks, Chad :::::::::::::::::::::::::::::::::::::::::::::::::::::: Jeffrey T. Frey, Ph.D. Systems Programmer V / HPC Management Network & Systems Services / College of Engineering University of Delaware, Newark DE 19716 Office: (302) 831-6034 Mobile: (302) 419-4976 ::::::::::::::::::::::::::::::::::::::::::::::::::::::