We are pleased to announce the release of Slurm version 17.02.4, which
contains about 40 fixes developed over the past month.
Slurm can be downloaded from https://www.schedmd.com/downloads.php
* Changes in Slurm 17.02.4
==========================
-- Do not attempt to schedule jobs after changing the power cap if
there are
already many active threads.
-- Job expansion example in FAQ enhanced to demonstrate operation in
heterogeneous environments.
-- Prevent scontrol crash when operating on array and no-array jobs at
once.
-- knl_cray plugin: Log incomplete capmc output for a node.
-- knl_cray plugin: Change capmc parsing of mcdram_pct from string to
number.
-- Remove log files from test20.12.
-- When rebooting a node and using the PrologFlags=alloc make sure the
prolog is ran after the reboot.
-- node_features/knl_generic - If a node is rebooted for a pending
job, but
fails to enter the desired NUMA and/or MCDRAM mode then drain the
node and
requeue the job.
-- node_features/knl_generic disable mode change unless RebootProgram
configured.
-- Add new burst_buffer function bb_g_job_revoke_alloc() to be executed
if there was a failure after the initial resource allocation. Does not
release previously allocated resources.
-- Test if the node_bitmap on a job is NULL when testing if the job's
nodes
are ready. This will be NULL is a job was revoked while beginning.
-- Fix incorrect lock levels when testing when job will run or
updating a job.
-- Add missing locks to job_submit/pbs plugin when updating a jobs
dependencies.
-- Add support for lua5.3
-- Add min_memory_per_node|cpu to the job_submit/lua plugin to deal
with lua
not being able to deal with pn_min_memory being a uint64_t.
Scripts are
urged to change to these new variables avoid issue. If not set the
variables will be 'nil'.
-- Calculate priority correctly when 'nice' is given.
-- Fix minor typos in the documentation.
-- node_features/knl_cray: Preserve non-KNL active features if slurmctld
reconfigured while node boot in progress.
-- node_features/knl_generic: Do not repeatedly log errors when trying
to read
KNL modes if not KNL system.
-- Add missing QOS read lock to backfill scheduler.
-- When doing a dlopen on liblua only attempt the version compiled
against.
-- Fix null-dereference in sreport cluster ulitization when configured
with
memory-leak-debug.
-- Fix Partition info in 'scontrol show node'. Previously duplicate
partition
names, or Partitions the node did not belong to could be displayed.
-- Fix it so the backup slurmdbd will take control correctly.
-- Fix unsafe use of MAX() macro, which could result in problems
cleaning up
accounting plugins in slurmd, or repeat job cancellation attempts in
scancel.
-- Fix 'scontrol update reservation duration=unlimited' to set the
duration
to 365-days (as is done elsewhere), rather than 49710 days.
-- Check if variable given to scontrol show job is a valid jobid.
-- Fix WithSubAccounts option to not include WithDeleted unless requested.
-- Prevent a job tested on multiple partitions from being marked
WHOLE_NODE_USER.
-- Prevent a race between completing jobs on a user-exclusive node from
leaving the node owned.
-- When scheduling take the nodes in completing jobs out of the mix to
reduce
fragmentation. SchedulerParameters=reduce_completing_frag
-- For jobs submited to multiple partitions, report the job's earliest
start
time for any partition.
-- Backfill partitions that use QOS Grp limits to "float" better.
-- node_features/knl_cray: don't clear configured GRES from non-KNL node.
-- sacctmgr - prevent segfault in command when a request is denied due
to a insufficient priviledges.
-- Add warning about libcurl-devel not being installed during configure.
-- Streamline job purge by handling file deletion on a separate thread.
-- Always set RLIMIT_CORE to the maximum permitted for slurmd, to ensure
core files are created even on non-developer builds.
-- Fix --ntasks-per-core option/environment variable parsing to set
the requested value, instead of always setting one.
-- If trying to cancel a step that hasn't started yet for some reason
return
a good return code.
-- Fix issue with sacctmgr show where user=''