We are pleased to announce the availability of Slurm version 21.08.2.

There is one significant change include in this maintenance release: the removal of support for the long-misunderstood TaskAffinity=yes option in cgroup.conf. Please consider using "TaskPlugins=cgroup,affinity" in slurm.conf as an option.

Unfortunately a number of issues identified where the processor affinity settings from this now-unsupported approach would be calculated incorrectly, leading to potential performance issues.

SchedMD had been previously planning to remove this support in the next 22.05 release, but a number of issues reported after the cgroup code refactoring have led us to remove this now, rather than try to correct issues with what has not been a recommended configuration for some time.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

* Changes in Slurm 21.08.2
==========================
 -- slurmctld - fix how the max number of cores on a node in a partition are
    calculated when the partition contains multi-socket nodes. This in turn
    corrects certain jobs node count estimations displayed client-side.
 -- job_submit/cray_aries - fix "craynetwork" GRES specification after changes
    introduced in 21.08.0rc1 that made TRES always have a type prefix.
 -- Ignore nonsensical check in the slurmd for [Pro|Epi]logSlurmctld.
 -- Fix writing to stderr/syslog when systemd runs slurmctld in the foreground.
 -- Fix locking around log level setting routines.
 -- Fix issue with updating job started with node range.
 -- Fix issue with nodes not clearing state in the database when the slurmctld
    is started with clean-start.
 -- Fix hetjob components > 1 timing out due to InactiveLimit.
 -- Fix sprio printing -nan for normalized association priority if
    PriorityWeightAssoc was not defined.
 -- Disallow FirstJobId=0.
 -- Preserve job start info in the database for a requeued job that hadn't
    registered the first time in the database yet.
 -- Only send one message on prolog failure from the slurmd.
 -- Remove support for TaskAffinity=yes in cgroup.conf.
 -- accounting_storage/mysql - fix issue where querying jobs via sacct
    --whole-hetjob=yes or slurmrestd (which automatically includes this flag)
    could in some cases return more records than expected.
 -- Fix issue for preemption of job array task that makes afterok dependency
    fail. Additionally, send emails when requeueing happens due to preemption.
 -- Fix sending requeue mail type.
 -- Properly resize a job's GRES bitmaps and counts when resizing the job.
 -- Fix node being able to transition to CLOUD state from non-cloud state.
 -- Fix regression introduced in 21.08.0rc1 which broke a step's ability to
    inherit GRES from the job when the step didn't request GRES but the job did.
 -- Fix errors in logic when picking nodes based on bracketed anded constraints.
    This also enforces the requirement to have a count when using such
    constraints.
 -- Handle job resize better in the database.
 -- Exclude currently running, resized jobs from the runaway jobs list.
 -- Make it possible to shrink a job more than once.

Reply via email to