Ok, thanks for the pointer it was an easy fix to make JOB_RELEASE work
in DRMAA again, 1 -> INFINITE.

https://github.com/eliv/slurm-drmaa-1/commit/073cc4b4f474366196ed44d4d758f77128655b59


On Wed, May 7, 2014 at 8:39 AM, E V <[email protected]> wrote:
> Ok, looking at that code it doesn't seem to have anything to do with
> job arrays. So I take it session->release is broken for all DRMAA jobs
> now. I've never looked at that part of the DRMAA code base before.
> I'll try and find it and see if it's easy to make release set the
> priority to infinity.
>
> On Tue, May 6, 2014 at 5:39 PM,  <[email protected]> wrote:
>>
>> You might need a code change. There is a Slurm user site that is very active
>> in resetting job priorities for entire job arrays and do not want held job
>> array tasks to be released when that happens. Releasing a held job requires
>> the use of "scontrol release <jobid>" or the API must set the job's priority
>> to INFINITE rather than any arbitrary number. The change is here:
>> https://github.com/SchedMD/slurm/commit/cbcea6728b554d83bfee086d98447fe7841355d1
>>
>>
>> Quoting E V <[email protected]>:
>>
>>> Just upgraded from 14.03.1-2 -> 14.03.3, and now my pipeline using
>>> DRMAA and multiple job array via run_bulk stopped working. It
>>> essential starts up two arrays one with hold set so it doesn't start,
>>> then calls wait each on the session and then starts up the held jobs
>>> as their corresponding jobs in the first array complete. However, now
>>> none of the held jobs get started after the upgrade.
>>> Worked as expected in 14.03.1-2. DRMAA shouldn't need to be rebuilt
>>> against 14.03.03, right?
>>>
>>> slurmctld.log looks pretty normal, though the ignore priority reset
>>> request messages are new is that telling me it's ignoring the hold
>>> release?
>>>
>>> [2014-05-06T17:02:14.899] completing job 5630 status 0
>>> [2014-05-06T17:02:14.899] sched: job_complete for JobId=5630
>>> successful, exit code=0
>>> [2014-05-06T17:02:15.796] ignore priority reset request on held job 5699
>>> [2014-05-06T17:02:15.796] _slurm_rpc_update_job complete JobId=5699
>>> uid=<uid> usec=82
>>
>>

Reply via email to