Hi all, Is there a way to submit submit maintenance jobs in a rolling fashion? What I’m thinking is the ability to run a job on every node in a slurm cluster/queue in exclusive mode but X at a time.
For example, say I want to reformat the local scratch space on every slurm node. I’d want to submit a job to reformat the local scratch space. This job would run exclusively on X nodes at a time. As it finishes on one node another job is scheduled for the next node until all nodes in the cluster/queue have run the job. I’ve been looking at slurm reservations but if I understand them correctly I would be reserving the whole cluster to run the job. This would block the whole cluster until maintenance is done. Alternatively, I would have to manually create reservations of a subset of the cluster run the job on those and then reserve the next part of the cluster. Navid