You might need a code change. There is a Slurm user site that is very active in resetting job priorities for entire job arrays and do not want held job array tasks to be released when that happens. Releasing a held job requires the use of "scontrol release <jobid>" or the API must set the job's priority to INFINITE rather than any arbitrary number. The change is here:
https://github.com/SchedMD/slurm/commit/cbcea6728b554d83bfee086d98447fe7841355d1
Quoting E V <[email protected]>:
Just upgraded from 14.03.1-2 -> 14.03.3, and now my pipeline using DRMAA and multiple job array via run_bulk stopped working. It essential starts up two arrays one with hold set so it doesn't start, then calls wait each on the session and then starts up the held jobs as their corresponding jobs in the first array complete. However, now none of the held jobs get started after the upgrade. Worked as expected in 14.03.1-2. DRMAA shouldn't need to be rebuilt against 14.03.03, right? slurmctld.log looks pretty normal, though the ignore priority reset request messages are new is that telling me it's ignoring the hold release? [2014-05-06T17:02:14.899] completing job 5630 status 0 [2014-05-06T17:02:14.899] sched: job_complete for JobId=5630 successful, exit code=0 [2014-05-06T17:02:15.796] ignore priority reset request on held job 5699 [2014-05-06T17:02:15.796] _slurm_rpc_update_job complete JobId=5699 uid=<uid> usec=82
