Thanks Chris

When I did that, they all came back.

Also found that in slurm.conf, ReturnToService was set to 0, so modified that 
for now. May turn it back to 0 to see if any nodes are lost, but I assume that 
will be in the log

Interestingly I had this in slurm.conf, thought that would make the initial 
state up for all

PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP


Steve Bland
Technical Product Manager
Third Party Products
Ross Video | Production Technology Experts
T: +1 (613) 228-0688 ext.4219
www.rossvideo.com<http://www.rossvideo.com/>
________________________________
From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Chris 
Samuel <ch...@csamuel.org>
Sent: 27 November 2020 15:02
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [EXTERNAL] Re: [slurm-users] trying to diagnose a connectivity issue 
between the slurmctld process and the slurmd nodes

On 26/11/20 9:21 am, Steve Bland wrote:

> Sinfo always returns nodes not responding

One thing - do the nodes return to this state when you resume them with
"scontrol update node=srvgridslurm[01-03] state=resume" ?

If they do then what does your slurmctld logs say for the reason for this?

You can bump up the log level on your slurmctld with (for instance
"scontrol setdebug debug" for more info (we run ours at debug all the
time anyway).

All the best,
Chris
--
Chris Samuel  :  
https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.csamuel.org%2F&amp;data=04%7C01%7Csbland%40rossvideo.com%7Cd08447ff5072423ef86f08d8930fa82d%7C5d1f9dedbb98418c9ad2e1d24a9152a1%7C1%7C1%7C637421042744008756%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=x5GjoV0mij7cMOciZv7w3wBH%2FEGONoV3i0fUDqoeRlI%3D&amp;reserved=0
  :  Berkeley, CA, USA
----------------------------------------------

This e-mail and any attachments may contain information that is confidential to 
Ross Video.

If you are not the intended recipient, please notify me immediately by replying 
to this message. Please also delete all copies. Thank you.

Reply via email to