We are pleased to announce the availability of Slurm version 15.08.5 which includes about 30 bug fixes developed over the past few weeks as listed below. Slurm downloads are available from:
http://www.schedmd.com/#repos

* Changes in Slurm 15.08.5
==========================
-- Prevent "scontrol update job" from updating jobs that have already finished.
 -- Show requested TRES in "squeue -O tres" when job is pending.
 -- Backfill scheduler: Test association and QOS node limits before reserving
    resources for pending job.
 -- burst_buffer/cray: If teardown operations fails, sleep and retry.
 -- Clean up the external pids when using the PrologFlags=Contain feature
    and the job finishes.
 -- burst_buffer/cray: Support file staging when job lacks job-specific buffer
    (i.e. only persistent burst buffers).
 -- Added srun option of --bcast to copy executable file to compute nodes.
 -- Fix for advanced reservation of burst buffer space.
 -- BurstBuffer/cray: Add logic to terminate dw_wlm_cli child processes at
    shutdown.
 -- If job can't be launch or requeued, then terminate it.
 -- BurstBuffer/cray: Enable clearing of burst buffer string on completed job
    as a means of recovering from a failure mode.
 -- Fix wrong memory free when parsing SrunPortRange=0-0 configuration.
 -- BurstBuffer/cray: Fix job record purging if cancelled from pending state.
 -- BGQ - Handle database throw correctly when syncing users on blocks.
 -- MySQL - Make sure we don't have a NULL string returned when not
    requesting any specific association.
 -- sched/backfill: If max_rpc_cnt is configured and the backlog of RPCs has
    not cleared after yielding locks, then continue to sleep.
 -- Preserve the job dependency description displayed in 'scontrol show job'
    even if the dependee jobs was terminated and cleaned causing the
    dependent to never run because of DependencyNeverSatisfied.
 -- Correct job task count calculation if only node count and ntasks-per-node
    options supplied.
 -- Make sure the association manager converts any string to be lower case
    as all the associations from the database will be lower case.
 -- Sanity check for xcgroup_delete() to verify incoming parameter is valid.
 -- Fix formatting for sacct with variables that switched from uint32_t to
    uint64_t.
 -- Fix a typo in sacct man page.
 -- Set up extern step to track any childern of an ssh if it leaves anything
    else behind.
 -- Prevent slurmdbd divide by zero if no associations defined at rollup time.
 -- Multifactor - Add sanity check to make sure pending jobs are handled
    correctly when PriorityFlags=CALCULATE_RUNNING is set.
 -- Add slurmdb_find_tres_count_in_string() to slurm db perl api.
 -- Make lua dlopen() conditional on version found at build.
 -- sched/backfill - Delay backfill scheduler for completing jobs only if
CompleteWait configuration parameter is set (make code match documentation).
 -- Release a job's allocated licenses only after epilog runs on all nodes
    rather than at start of termination process.
-- Cray job NHC delayed until after burst buffer released and epilog completes
    on all allocated nodes.
 -- Fix abort of srun if using PrologFlags=NoHold
 -- Let devices step_extern cgroup inherit attributes of job cgroup.
 -- Add new hook to Task plugin to be able to put adopted processes in the
    step_extern cgroups.
 -- Fix AllowUsers documentation in burst_buffer.conf man page. Usernames are
    comma separated, not colon delimited.
 -- Fix issue with time limit not being set correctly from a QOS when a job
    requests no time limit.
 -- Various CLANG fixes.
 -- In both sched/basic and backfill: If a job can not be started due to some
    account/qos limit, then don't start other jobs which could delay jobs. The
    old logic would skip the job and start other jobs, which could delay the
    higher priority job.
 -- select/cray: Prevent NHC from running more than once per job or step.
 -- Fix fields not properly printed when adding an account through sacctmgr.
 -- Update LBNL Node Health Check (NHC) link on FAQ.
-- Fix multifactor plugin to prevent slurmctld from getting segmentation fault
    should the tres_alloc_cnt be NULL.
 -- sbatch/salloc - Move nodelist logic before the time min_nodes is used
    so we can set it correctly before tasks are set.

Reply via email to