Ok, thanks for the pointer it was an easy fix to make JOB_RELEASE work in DRMAA again, 1 -> INFINITE.
https://github.com/eliv/slurm-drmaa-1/commit/073cc4b4f474366196ed44d4d758f77128655b59 On Wed, May 7, 2014 at 8:39 AM, E V <[email protected]> wrote: > Ok, looking at that code it doesn't seem to have anything to do with > job arrays. So I take it session->release is broken for all DRMAA jobs > now. I've never looked at that part of the DRMAA code base before. > I'll try and find it and see if it's easy to make release set the > priority to infinity. > > On Tue, May 6, 2014 at 5:39 PM, <[email protected]> wrote: >> >> You might need a code change. There is a Slurm user site that is very active >> in resetting job priorities for entire job arrays and do not want held job >> array tasks to be released when that happens. Releasing a held job requires >> the use of "scontrol release <jobid>" or the API must set the job's priority >> to INFINITE rather than any arbitrary number. The change is here: >> https://github.com/SchedMD/slurm/commit/cbcea6728b554d83bfee086d98447fe7841355d1 >> >> >> Quoting E V <[email protected]>: >> >>> Just upgraded from 14.03.1-2 -> 14.03.3, and now my pipeline using >>> DRMAA and multiple job array via run_bulk stopped working. It >>> essential starts up two arrays one with hold set so it doesn't start, >>> then calls wait each on the session and then starts up the held jobs >>> as their corresponding jobs in the first array complete. However, now >>> none of the held jobs get started after the upgrade. >>> Worked as expected in 14.03.1-2. DRMAA shouldn't need to be rebuilt >>> against 14.03.03, right? >>> >>> slurmctld.log looks pretty normal, though the ignore priority reset >>> request messages are new is that telling me it's ignoring the hold >>> release? >>> >>> [2014-05-06T17:02:14.899] completing job 5630 status 0 >>> [2014-05-06T17:02:14.899] sched: job_complete for JobId=5630 >>> successful, exit code=0 >>> [2014-05-06T17:02:15.796] ignore priority reset request on held job 5699 >>> [2014-05-06T17:02:15.796] _slurm_rpc_update_job complete JobId=5699 >>> uid=<uid> usec=82 >> >>
