We are pleased to announce the availability of Slurm version 21.08.7.

This includes a number of minor to moderate severity fixes that have accumulated since the last maintenance release was made two months ago.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

* Changes in Slurm 21.08.7
==========================
 -- openapi/v0.0.37 - correct calculation for bf_queue_len_mean in /diag.
 -- Optimize sending down nodes in maintenance mode to the database when
    removing reservations.
 -- Avoid shrinking a reservation when overlapping with downed nodes.
 -- Fix 'planned time' in rollups for jobs that were still pending when the
    rollup happened.
 -- Prevent new elements from a job array from causing rerollups.
 -- Only check TRES limits against current usage for TRES requested by the job.
 -- Do not allocate shared gres (MPS) in whole-node allocations
 -- Fix minor memory leak when dealing with configless setups.
 -- Constrain slurmstepd to job/step cgroup like in previous versions of Slurm.
 -- Fix warnings on 32-bit compilers related to printf() formats.
 -- Fix memory leak when freeing kill_job_msg_t.
 -- Fix memory leak when using data_t.
 -- Fix reconfigure issues after disabling/reenabling the GANG PreemptMode.
 -- Fix race condition where a cgroup was being deleted while another step
    was creating it.
 -- Set the slurmd port correctly if multi-slurmd
 -- openapi/v0.0.37 - Fix misspelling of account_gather_frequency in spec.
 -- openapi/v0.0.37 - Fix misspelling of cluster_constraint in spec.
 -- Fix FAIL mail not being sent if a job was cancelled due to preemption.
 -- slurmrestd - move debug logs for HTTP handling to be gated by debugflag
    NETWORK to avoid unnecessary logging of communication contents.
 -- Fix issue with bad memory access when shrinking running steps.
 -- Fix various issues with internal job accounting with GRES when jobs are
    shrunk.
 -- Fix ipmi polling on slurmd reconfig or restart.
 -- Fix srun crash when reserved ports are being used and het step fails
    to launch.
 -- openapi/dbv0.0.37 - fix DELETE execution path on /user/{user_name}.
 -- slurmctld - Properly requeue all components of a het job if PrologSlurmctld
    fails.
 -- rlimits - remove final calls to limit nofiles to 4096 but to instead use
    the max possible nofiles in slurmd and slurmdbd.
 -- Fix slurmctld memory leak after a reconfigure with configless.
 -- Fix slurmd memory leak when fetching configless files.
 -- Allow the DBD agent to load large messages (up to MAX_BUF_SIZE) from state.
 -- Fix minor memory leak with cleaning up the extern step.
 -- Fix potential deadlock during slurmctld restart when there is a completing
    job.
 -- slurmstepd - reduce user requested soft rlimits when they are above max
    hard rlimits to avoid rlimit request being completely ignored and
    processes using default limits.
 -- Fix memory leaks when job/step specifies a container.
 -- Fix Slurm user commands displaying available features as active features
    when no features were active.
 -- Don't power down nodes that are rebooting.
 -- Clear pending node reboot on power down request.
 -- Ignore node registrations while node is powering down.
 -- Don't reboot any node that is power<ing|ed> down.
 -- Don't allow a node to reboot if it's marked for power down.
 -- Fix issuing reboot and downing when rebooting a powering up node.
 -- Clear DRAIN on node after failing to resume before ResumeTimeout.
 -- Prevent repeating power down if node fails to resume before ResumeTimeout.
 -- Fix federated cloud node communication with srun and cloud_dns.
 -- Fix jobs being scheduled on nodes marked to be powered_down when idle.

Reply via email to