All,

I have configured a backup slurmctld system and it appears to work at first, 
but not in practice.
In particular, when I start it, it says it is running in background mode:
[2017-01-25T14:23:37.648] slurmctld version 16.05.6 started on cluster hamming
[2017-01-25T14:23:37.650] slurmctld running in background mode

But if I stop the primary daemon, it does not take over. I keep getting Invalid 
RPC errors (random snippets):
[2017-01-25T15:50:37.664] error: Invalid RPC received 2007 while in standby mode
[2017-01-25T15:53:50.495] error: Invalid RPC received 5018 while in standby mode
[2017-01-25T15:59:36.847] error: Invalid RPC received 2007 while in standby mode
[2017-01-25T15:59:37.499] error: Invalid RPC received 2007 while in standby mode
[2017-01-25T15:59:38.923] error: Invalid RPC received 2007 while in standby mode
[2017-01-25T15:59:38.985] error: Invalid RPC received 2007 while in standby mode
[2017-01-25T15:59:39.246] error: Invalid RPC received 2007 while in standby mode
[2017-01-25T15:59:39.293] error: Invalid RPC received 2009 while in standby mode
[2017-01-25T15:59:39.522] error: Invalid RPC received 5018 while in standby mode
[2017-01-25T15:59:43.839] error: Invalid RPC received 2009 while in standby mode
[2017-01-25T15:59:43.930] error: Invalid RPC received 2009 while in standby mode
[2017-01-25T16:19:47.215] error: Invalid RPC received 6012 while in standby mode
[2017-01-25T16:19:48.238] error: Invalid RPC received 6012 while in standby mode

And on any client running 'sinfo' for instance, it merely hangs.
The interfaces for both slurmctld controllers are in the 'trusted' firewall 
group and there is no filtering between them.
Is there something I am missing to make the backup controller 'kick in' and 
start responding to requests?


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238

Reply via email to