"Golpayegani, Navid (GSFC-6190)" <navid.golpayeg...@nasa.gov> writes:
> Hi all, > Is there a way to submit submit maintenance jobs in a rolling fashion? What > I’m thinking is the ability to run a job on every node in a slurm > cluster/queue in exclusive mode but X at a time. We do this for rolling upgrades. Basically, we submit X copies of a jobscript that asks for exclusive access to any node with a feature "fixme" (actually, we use "vaskmeg" :). The jobs are run as root and specify --nice -10000 to get highest priority. They do their job, remove the "fixme" feature from the node, and then request themself to be requeued. Prior to submit the jobs, we add the "fixme" feature to all nodes needing maintenance. (In reality, our setup is a little mor complex, since it includes reinstalling the os on the nodes, but the principle is the same.) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature