I have this reservation which has the REPLACE_DOWN flag set:
ReservationName=test StartTime=2020-12-14T09:00:00 EndTime=2023-12-14T09:00:00
Duration=1095-00:00:00
Nodes=traverse-k05g4 NodeCnt=1 CoreCnt=32 Features=(null) PartitionName=all
Flags=IGNORE_JOBS,WEEKDAY,SPEC_NODES,REPLACE_DOWN,NO_HOLD_JOBS_AFTER_END
TRES=cpu=128
Users=(null) Groups=(null) Accounts=pppl,csi,pu,tromp,cses Licenses=(null)
State=ACTIVE BurstBuffer=(null) Watts=n/a
MaxStartDelay=(null)
Unfortunately, the one node in that reservation is down, and the
reservation isn't being moved to another node:
# sinfo -n traverse-k05g4
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
all* up 15-00:00:0 1 down* traverse-k05g4
I thought if I removed the node from the reservation, it would just get
assigned to a different node, or if I removed the SPEC_NODES flag I
could accomplish the same thing, but scontrol didn't like when I tried
that:
# scontrol update reservationname=test nodes-=traverse-k05g4
scontrol: error: Reservation can't be updated with Nodes option; it is
incompatible with REPLACE[_DOWN]
Error updating the reservation: Requested operation not supported on this system
slurm_update error: Requested operation not supported on this system
# scontrol update reservationname=test flags-=spec_nodes
scontrol: error: Error parsing flags -spec_nodes. No reservation update.
slurm_update error: No error
Any ideas of what I'm doing wrong here, or what I can do to get this
reservation assigned to nodes that are up? I'm trying to avoid deleting
the entire reservation and create a new one, if possible.
--
Prentice