Thanks Chris When I did that, they all came back.
Also found that in slurm.conf, ReturnToService was set to 0, so modified that for now. May turn it back to 0 to see if any nodes are lost, but I assume that will be in the log Interestingly I had this in slurm.conf, thought that would make the initial state up for all PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP Steve Bland Technical Product Manager Third Party Products Ross Video | Production Technology Experts T: +1 (613) 228-0688 ext.4219 www.rossvideo.com<http://www.rossvideo.com/> ________________________________ From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Chris Samuel <ch...@csamuel.org> Sent: 27 November 2020 15:02 To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> Subject: [EXTERNAL] Re: [slurm-users] trying to diagnose a connectivity issue between the slurmctld process and the slurmd nodes On 26/11/20 9:21 am, Steve Bland wrote: > Sinfo always returns nodes not responding One thing - do the nodes return to this state when you resume them with "scontrol update node=srvgridslurm[01-03] state=resume" ? If they do then what does your slurmctld logs say for the reason for this? You can bump up the log level on your slurmctld with (for instance "scontrol setdebug debug" for more info (we run ours at debug all the time anyway). All the best, Chris -- Chris Samuel : https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&data=04%7C01%7Csbland%40rossvideo.com%7Cd08447ff5072423ef86f08d8930fa82d%7C5d1f9dedbb98418c9ad2e1d24a9152a1%7C1%7C1%7C637421042744008756%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=x5GjoV0mij7cMOciZv7w3wBH%2FEGONoV3i0fUDqoeRlI%3D&reserved=0 : Berkeley, CA, USA ---------------------------------------------- This e-mail and any attachments may contain information that is confidential to Ross Video. If you are not the intended recipient, please notify me immediately by replying to this message. Please also delete all copies. Thank you.