Hi all,
  Is there a way to submit submit maintenance jobs in a rolling fashion? What 
I’m thinking is the ability to run a job on every node in a slurm cluster/queue 
in exclusive mode but X at a time.

For example, say I want to reformat the local scratch space on every slurm 
node. I’d want to submit a job to reformat the local scratch space. This job 
would run exclusively on X nodes at a time. As it finishes on one node another 
job is scheduled for the next node until all nodes in the cluster/queue have 
run the job.

I’ve been looking at slurm reservations but if I understand them correctly I 
would be reserving the whole cluster to run the job. This would block the whole 
cluster until maintenance is done. Alternatively, I would have to manually 
create reservations of a subset of the cluster run the job on those and then 
reserve the next part of the cluster.

Navid

Reply via email to