Ok, looking at that code it doesn't seem to have anything to do with
job arrays. So I take it session->release is broken for all DRMAA jobs
now. I've never looked at that part of the DRMAA code base before.
I'll try and find it and see if it's easy to make release set the
priority to infinity.

On Tue, May 6, 2014 at 5:39 PM,  <[email protected]> wrote:
>
> You might need a code change. There is a Slurm user site that is very active
> in resetting job priorities for entire job arrays and do not want held job
> array tasks to be released when that happens. Releasing a held job requires
> the use of "scontrol release <jobid>" or the API must set the job's priority
> to INFINITE rather than any arbitrary number. The change is here:
> https://github.com/SchedMD/slurm/commit/cbcea6728b554d83bfee086d98447fe7841355d1
>
>
> Quoting E V <[email protected]>:
>
>> Just upgraded from 14.03.1-2 -> 14.03.3, and now my pipeline using
>> DRMAA and multiple job array via run_bulk stopped working. It
>> essential starts up two arrays one with hold set so it doesn't start,
>> then calls wait each on the session and then starts up the held jobs
>> as their corresponding jobs in the first array complete. However, now
>> none of the held jobs get started after the upgrade.
>> Worked as expected in 14.03.1-2. DRMAA shouldn't need to be rebuilt
>> against 14.03.03, right?
>>
>> slurmctld.log looks pretty normal, though the ignore priority reset
>> request messages are new is that telling me it's ignoring the hold
>> release?
>>
>> [2014-05-06T17:02:14.899] completing job 5630 status 0
>> [2014-05-06T17:02:14.899] sched: job_complete for JobId=5630
>> successful, exit code=0
>> [2014-05-06T17:02:15.796] ignore priority reset request on held job 5699
>> [2014-05-06T17:02:15.796] _slurm_rpc_update_job complete JobId=5699
>> uid=<uid> usec=82
>
>

Reply via email to