We are pleased to announce the availability of Slurm version 21.08.3.

This includes a number of fixes since the last release a month ago, including one critical fix to prevent a communication issue between slurmctld and slurmdbd for sites that have started using the new AccountingStoreFlags=job_script functionality.

Slurm can be downloaded from https://www.schedmd.com/downloads.php .

- Tim

--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support

* Changes in Slurm 21.08.3
==========================
 -- Return error to sacctmgr when running 'sacctmgr archive load' and the load
    fails due to an invalid or corrupted file.
 -- slurmctld/gres_ctld - fix deallocation of typed GRES without device.
 -- scrontab - fix capturing the cronspec request in the job script.
 -- openapi/dbv0.0.37 - Add missing method POST for /associations/.
 -- If ALTER TABLE was already run, continue with database upgrade.
 -- slurmstepd - Gracefully handle RunTimeQuery returning no output.
 -- srun - automatically handle issues with races to listen() on an ephemeral
    socket, and suppress otherwise needless error messages.
 -- Schedule sooner after Epilog completion with SchedulerParameters=defer.
 -- Improve performance for AccountingStoreFlags=job_env.
 -- Expose missing SLURMD_NODENAME and SLURM_NODEID to TaskEpilog environment.
 -- Bring slurm_completion.sh up to date with changes to commands.
 -- Fix issue where burst buffer stage-in could only start for one job in a job
    array per scheduling cycle instead of bb_array_stage_cnt jobs per scheduling
    cycle.
 -- Fix checking if the dependency is the same job for array jobs.
 -- Fix checking for circular dependencies with job arrays.
 -- Restore dependent job pointers on slurmctld startup to avoid race.
 -- openapi/v0.0.37 - Allow strings for JobIds instead of only numerical JobIds
    for GET, DELETE, and POST job methods.
 -- openapi/dbv0.0.36 - Gracefully handle missing associations.
 -- openapi/dbv0.0.36 - Avoid restricting job association lookups to only
    default associations.
 -- openapi/dbv0.0.37 - Gracefully handle missing associations.
 -- openapi/dbv0.0.37 - Avoid restricting job association lookups to only
    default associations.
 -- Fix error in GPU frequency validation logic.
 -- Fix regression in 21.08.1 that broke federated jobs.
 -- Correctly handle requested GRES when used in job arrays.
 -- Fix error in pmix logic dealing with the incorrect size of buffer.
 -- Fix handling of no_consume GRES, add it to allocated job allocated TRES.
 -- Fix issue with typed GRES without Files= (bitmap).
 -- Fix job_submit/lua support for 'gres' which is now stored as a 'tres'
    when requesting jobs so needs a 'gres' prefix.
 -- Fix regression where MPS would not deallocate from the node properly.
 -- Fix --gpu-bind=verbose to work correctly.
 -- Do not deny --constraint with special operators "[]()|*" when no changeable
    features are requested, but continue to deny --constraint with special
    operators when changeable features are requested.
 -- openapi/v0.0.{35,36,37} - prevent merging the slurmrestd environment
    alongside a new job submission.
 -- openapi/dbv0.0.36 - Correct tree position of dbv0.0.36_job_step.
 -- openapi/dbv0.0.37 - Correct tree position of dbv0.0.37_job_step.
 -- openapi/v0.0.37 - enable job priority field for job submissions and updates.
 -- openapi/v0.0.37 - request node states query includes MIXED state instead of
    only allocated.
 -- mpi/pmix - avoid job hanging until the time limit on PMIx agent failures.
 -- Correct inverted logic where reduced version matching applied to non-SPANK
    plugins where it should have only applied to SPANK plugins.
 -- Fix issues where prologs would run in serial without PrologFlags=serial.
 -- Make sure a job coming in is initially considered for magnetic reservations.
 -- PMIx v1.1.4 and below are no longer supported.
 -- Add comment to service files about disabling logging through journald.
 -- Add SLURM_NODE_ALIASES env to RPC Prolog (PrologFlags=alloc) environment.
 -- Limit max_script_size to 512 MB.
 -- Fix shutdown of slurmdbd plugin to correctly notice when the agent thread
    finishes.
 -- slurmdbd - fix issue with larger batch script files being sent to SlurmDBD
    with AccountingStoreFlags=job_script that can lead to accounting data loss
    as the resulting RPC generated can exceed internal limits and won't be
    sent, preventing further communication with SlurmDBD.
    This issue is indicated by "error: Invalid msg_size" in your log files.
 -- Fix compile issue with --without-shared-libslurm.

Reply via email to