We are pleased to announce the availability of Slurm versions 15.08.11
and 16.05.0-rc1 (release candidate 1).
15.08.11 contains around 25 rather minor bug fixes, detailed below.
Please upgrade at your leisure.
The rc release contains all of the features intended for release 16.05.
Development has ended for this release and we are continuing with our
testing phase which will most likely result in another rc before we tag
16.05.0 near the end of the month. A description of what this release
contains is in the RELEASE_NOTES file available in the source. Your
help in hardening this version is greatly appreciated. You are invited
to download this version and assist in testing.
Slurm downloads are available from http://schedmd.com/#repos
* Changes in Slurm 15.08.11
===========================
-- Fix for job "--contiguous" option that could cause job
allocation/launch
failure or slurmctld crash.
-- Fix to setup logs for single-character program names correctly.
-- Backfill scheduling performance enhancement with large number of
running
jobs.
-- Reset job's prolog_running counter on slurmctld restart or reconfigure.
-- burst_buffer/cray - Update job's prolog_running counter if pre_run
fails.
-- MYSQL - Make the error message more specific when removing a
reservation
and it doesn't meet basic requirements.
-- burst_buffer/cray - Fix for script creating or deleting persistent
buffer
would fail "paths" operation and hold the job.
-- power/cray - Prevent possible divide by zero.
-- power/cray - Fix bug introduced in 15.08.10 preventing operation in
many
cases.
-- Prevent deadlock for flow of data to the slurmdbd when sending
reservation
that wasn't set up correctly.
-- burst_buffer/cray - Don't call Datawarp "paths" function if script
includes
only create or destroy of persistent burst buffer. Some versions of
Datawarp
software return an error for such scripts, causing the job to be held.
-- Fix potential issue when adding and removing TRES which could result
in the slurmdbd segfaulting.
-- Add cast to memory limit calculation to prevent integer overflow for
very large memory values.
-- Bluegene - Fix issue with reservations resizing under the covers on a
restart of the slurmctld.
-- Avoid error message of "Requested cpu_bind option requires entire
node to
be allocated; disabling affinity" being generated in some cases where
task/affinity and task/cgroup plugins used together.
-- Fix version issue when packing GRES information between 2 different
versions
of Slurm.
-- Fix for srun hanging with OpenMPI and PMIx
-- Better initialization of node_ptr when dealing with protocol_version.
-- Fix incorrect type when initializing header of a message.
-- MYSQL - Fix incorrect usage of limit and union.
-- MYSQL - Remove 'ignore' from alter ignore when updating a table.
-- Documentation - update prolog_epilog page to reflect current behavior
if the Prolog fails.
-- Documentation - clarify behavior of 'srun --export=NONE' in man page.
-- Fix potential gres underflow on restart of slurmctld.
-- Fix sacctmgr to remove a user who has no associations.
* Changes in Slurm 16.05.0rc1
==============================
-- Remove the SchedulerParameters option of "assoc_limit_continue",
making it
the default value. Add option of "assoc_limit_stop". If
"assoc_limit_stop"
is set and a job cannot start due to association limits, then do
not attempt
to initiate any lower priority jobs in that partition. Setting this can
decrease system throughput and utlization, but avoid potentially
starving
larger jobs by preventing them from launching indefinitely.
-- Update a node's socket and cores per socket counts as needed after
a node
boot to reflect configuration changes which can occur on KNL
processors.
Note that the node's total core count must not change, only the
distribution
of cores across varying socket counts (KNL NUMA nodes treated as
sockets by
Slurm).
-- Rename partition configuration from "Shared" to "OverSubscribe". Rename
salloc, sbatch, srun option from "--shared" to "--oversubscribe".
The old
options will continue to function. Output field names also changed in
scontrol, sinfo, squeue and sview.
-- Add SLURM_UMASK environment variable to user job.
-- knl_conf: Added new configuration parameter of CapmcPollFreq.
-- squeue: remove errant spaces in column formats for "squeue -o %all".
-- Add ARRAY_TASKS mail option to send emails to each task in a job array.
-- Change default compression library for sbcast to lz4.
-- select/cray - Initiate step node health check at start of step
termination
rather than after application completely ends so that NHC can capture
information about hung (non-killable) processes.
-- Add --units=[KMGTP] option to sacct to display values in specific
unit type.
-- Modify sacct and sacctmgr to display TRES values in converted units.
-- Modify sacctmgr to accept TRES values with [KMGTP] suffixes.
-- Replace hash function with more modern SipHash functions.
-- Add "--with-cray_dir" build/configure option.
-- BB- Only send stage_out email when stage_out is set in script.
-- Add r/w locking to file_bcast receive functions in slurmd.
-- Add TopologyParam option of "TopoOptional" to optimize network topology
only for jobs requesting it.
-- Fix build on FreeBSD.
-- Configuration parameter "CpuFreqDef" used to set default governor
for job
step not specifying --cpu-freq (previously the parameter was unused).
-- Fix sshare -o<format> to correctly display new lengths.
-- Update documentation to rename Shared option to OverSubscribe.
-- Update documentation to rename partition Priority option to
PriorityTier.
-- Prevent changing of QOS on running jobs.
-- Update accounting when changing QOS on pending jobs.
-- Add support to ntasks_per_socket in task/affinity.
-- Generate init.d and systemd service scripts in etc/ through Make rather
than at configure time to ensure all variable substitutions happen.
-- Use TaskPluginParam for default task binding if no user specified CPU
binding. User --cpu_bind option takes precident over default. No longer
any error if user --cpu_bind option does not match TaskPluginParam.
-- Make sacct and sattach work with older slurmd versions.
-- Fix protocol handling between 15.08 and 16.05 for 'scontrol show
config'.
-- Enable prefixes (e.g. info, debug, etc.) in slurmstepd debugging.