We are pleased to announce the availability of Slurm versions 16.05.9 and 17.02.0-0rc1 (release candidate 1).

16.05.9 contains around 25 rather minor bug fixes. Please upgrade at your leisure.

The rc release contains all of the features intended for release 17.02. Development has ended for this release and we are continuing with our testing phase which will most likely result in another rc before we tag 17.02.0 near the middle of February. A description of what this release contains is in the RELEASE_NOTES file available in the source. Your help in hardening this version is greatly appreciated. You are invited to download this version and assist in testing. As with all rc releases you should be able to install and not worry about protocol/state changes going forward with the version.

Slurm downloads are available from https://schedmd.com/downloads.php.

Reading from NEWS for 16.05.9...

* Changes in Slurm 16.05.9
==========================
 -- Fix parsing of SBCAST_COMPRESS environment variable in sbcast.
 -- Change some debug messages to errors in task/cgroup plugin.
-- backfill scheduler: Stop trying to determine expected start time for a job after 2 seconds of wall time. This can happen if there are many running jobs
    and a pending job can not be started soon.
 -- Improve performance of cr_sort_part_rows() in cons_res plugin.
 -- CRAY - Fix dealock issue when updating accounting in the slurmctld and
    scheduling a Datawarp job.
-- Correct the job state accounting information for jobs requeued due to burst
    buffer errors.
 -- burst_buffer/cray - Avoid "pre_run" operation if not using buffer (i.e.
    just creating or deleting a persistent burst buffer).
 -- Fix slurm.spec file support for BlueGene builds.
-- Fix missing TRES read lock in acct_policy_job_runnable_pre_select() code.
 -- Fix debug2 message printing value using wrong array index in
    _qos_job_runnable_post_select().
 -- Prevent job timeout on node power up.
 -- MYSQL - Fix minor memory leak when querying steps and the sql fails.
-- Make it so sacctmgr accepts column headers like MaxTRESPU and not MaxTRESP.
 -- Only look at SLURM_STEP_KILLED_MSG_NODE_ID on startup, to avoid race
    condition later when looking at a steps env.
 -- Make backfill scheduler behave like regular scheduler in respect to
    'assoc_limit_stop'.
-- Allow a lower version client command to talk to a higher version contoller
    using the multi-cluster options (e.g. squeue -M<clsuter>).
-- slurmctld/agent race condition fix: Prevent job launch while PrologSlurmctld
    daemon is running or node boot in progress.
-- MYSQL - Fix a few other minor memory leaks when uncommon failures occur. -- burst_buffer/cray - Fix race condition that could cause multiple batch job
    launch requests resulting in drained nodes.
 -- Correct logic to purge old reservations.
 -- Fix DBD cache restore from previous versions.
 -- Fix to logic for getting expected start time of existing job ID with
    explicit begin time that is in the past.
-- Clear job's reason of "BeginTime" in a more timely fashion and/or prevents
    them from being stuck in a PENDING state.
-- Make sure acct policy limits imposed on a job are correct after requeue.

Reading from NEWS for 17.02.0-0rc1...

* Changes in Slurm 17.02.0rc1
==============================
 -- Add port info to 'sinfo' and 'scontrol show node'.
 -- Fix errant definition of USE_64BIT_BITSTR which can lead to core dumps.
 -- Move BatchScript to end of each job's information when using
    "scontrol -dd show job" to make it more readable.
-- Add SchedulerParameters configuration parameter of "default_gbytes", which treats numeric only (no suffix) value for memory and tmp disk space as being
    in units of Gigabytes. Mostly for compatability with LSF.
 -- Fix race condtion in srun/sattach logic which would prevent srun from
    terminating.
 -- Bitstring operations are now 64bit instead of 32bit.
 -- Replace hweight() function in bitstring with faster version.
 -- scancel would treat a non-numeric argument as the name of jobs to be
cancelled (a non-documented feature). Cancelling jobs by name now require
    the "--jobname=" command line argument.
-- scancel modified to note that no jobs satisfy the filter options when the --verbose option is used along with one or more job filters (e.g. "--qos=").
 -- Change _pack_cred to use pack_bit_str_hex instead of pack_bit_fmt for
    better scalability and performance.
-- Add BootTime configuration parameter to knl.conf file to optimize resource
    allocations with respect to required node reboots.
 -- Add node_features_p_boot_time() to node_features plugin to optimize
    scheduling with respect to node reboots.
-- Avoid allocating resources to a job in the event that its run time plus boot
    time (if needed) extent into an advanced reservation.
 -- Burst_buffer/cray - Avoid stage-out operation if job never started.
-- node_features/knl_cray - Add capability to detected Uncorrectable Memory Errors (UME) and if detected then log the event in all job and step stderr
    with a message of the form:
error: *** STEP 1.2 ON tux1 UNCORRECTABLE MEMORY ERROR AT 2016-12-14T09:09:37 *** Similar logic added to node_features/knl_generic in version 17.02.0pre4. -- If job is allocated nodes which are powered down, then reset job start time
    when the nodes are ready and do not charge the job for power up time.
 -- Add the ability to purge transactions from the database.
 -- Add support for requeue'ing of federated jobs (BETA).
 -- Add support for interactive federated jobs (BETA).
 -- Add the ability to purge rolled up usage from the database.
-- CRAY systems only: TaskPlugins must list task/cgroup before task/cray in
    order for the cgroup files to be created before task/cray runs.
 -- Properly set SLURM_JOB_GPUS environment variable for Prolog.

Reply via email to