Hi list, we have a new cluster setup with Bright cluster manager. Looking into
a support contract there, but trying to get community support in the mean time.
I'm sure things were working when the cluster was delivered, but I provisioned
an additional node and now the scheduler isn't quite wo
Made a little bit of progress by running sinfo:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
defq*up infinite 3 drain n[011-013]
defq*up infinite 1 alloc n010
not sure why n[011-013] are in drain state, that needs to be fixed.
After some searching, I ran:
s
Hi Chandler,
If the only changes to your system have been the slurm.conf
configuration and the addition of a new node, the easiest way to track
this down is probably to show us the diffs between the previous and
current versions of slurm.conf, and a note about what's different about
the new n
Heh. Your nodes are drained.
do:
scontrol update state=resume nodename=n[011-013]
If they go back into a drained state, you need to look into why. That
will be in the slurmctld log. You can also see it with 'sinfo -R'
Brian Andrus
On 1/27/2021 10:18 PM, Chandler wrote:
Made a little bit of
On 1/27/21 9:28 pm, Chandler wrote:
Hi list, we have a new cluster setup with Bright cluster manager.
Looking into a support contract there, but trying to get community
support in the mean time. I'm sure things were working when the cluster
was delivered, but I provisioned an additional node
Christopher Samuel wrote on 1/28/21 12:50:
Did you restart the slurm daemons when you added the new node? Some internal
data structures (bitmaps) are build based on the number of nodes and they need
to be rebuild with a restart in this situation.
https://slurm.schedmd.com/faq.html#add_nodes