Hi Seen this a few times now, we have jobs queued that should be able to run but they wont start unless I restart the controller daemon. Other jobs submitted more recently seem to working fine.
I can see from the slurmctld log file with debug=9 that they are not being tested to see if they are runnable, does this mean that the daemon has somehow forgotten about them? I just restarted the daemon and they started immediately. Any ideas how I can debug this if it happens again? Cheers,
