[slurm-dev] Slurm versions 16.05.9 and 17.02.0-0rc1 are now available

Danny Auble Tue, 31 Jan 2017 12:35:50 -0800

We are pleased to announce the availability of Slurm versions 16.05.9and 17.02.0-0rc1 (release candidate 1).

16.05.9 contains around 25 rather minor bug fixes. Please upgrade atyour leisure.

The rc release contains all of the features intended for release 17.02.Development has ended for this release and we are continuing with ourtesting phase which will most likely result in another rc before we tag17.02.0 near the middle of February. A description of what this releasecontains is in the RELEASE_NOTES file available in the source. Your helpin hardening this version is greatly appreciated. You are invited todownload this version and assist in testing. As with all rc releases youshould be able to install and not worry about protocol/state changesgoing forward with the version.


Slurm downloads are available from https://schedmd.com/downloads.php.

Reading from NEWS for 16.05.9...

* Changes in Slurm 16.05.9
==========================
 -- Fix parsing of SBCAST_COMPRESS environment variable in sbcast.
 -- Change some debug messages to errors in task/cgroup plugin.

-- backfill scheduler: Stop trying to determine expected start timefor a jobafter 2 seconds of wall time. This can happen if there are manyrunning jobs

    and a pending job can not be started soon.
 -- Improve performance of cr_sort_part_rows() in cons_res plugin.
 -- CRAY - Fix dealock issue when updating accounting in the slurmctld and
    scheduling a Datawarp job.

-- Correct the job state accounting information for jobs requeued dueto burst

    buffer errors.
 -- burst_buffer/cray - Avoid "pre_run" operation if not using buffer (i.e.
    just creating or deleting a persistent burst buffer).
 -- Fix slurm.spec file support for BlueGene builds.

-- Fix missing TRES read lock in acct_policy_job_runnable_pre_select()code.

 -- Fix debug2 message printing value using wrong array index in
    _qos_job_runnable_post_select().
 -- Prevent job timeout on node power up.
 -- MYSQL - Fix minor memory leak when querying steps and the sql fails.

-- Make it so sacctmgr accepts column headers like MaxTRESPU and notMaxTRESP.

 -- Only look at SLURM_STEP_KILLED_MSG_NODE_ID on startup, to avoid race
    condition later when looking at a steps env.
 -- Make backfill scheduler behave like regular scheduler in respect to
    'assoc_limit_stop'.

-- Allow a lower version client command to talk to a higher versioncontoller

    using the multi-cluster options (e.g. squeue -M<clsuter>).

-- slurmctld/agent race condition fix: Prevent job launch whilePrologSlurmctld

    daemon is running or node boot in progress.

-- MYSQL - Fix a few other minor memory leaks when uncommon failuresoccur.-- burst_buffer/cray - Fix race condition that could cause multiplebatch job

    launch requests resulting in drained nodes.
 -- Correct logic to purge old reservations.
 -- Fix DBD cache restore from previous versions.
 -- Fix to logic for getting expected start time of existing job ID with
    explicit begin time that is in the past.

-- Clear job's reason of "BeginTime" in a more timely fashion and/orprevents

    them from being stuck in a PENDING state.

-- Make sure acct policy limits imposed on a job are correct afterrequeue.


Reading from NEWS for 17.02.0-0rc1...

* Changes in Slurm 17.02.0rc1
==============================
 -- Add port info to 'sinfo' and 'scontrol show node'.
 -- Fix errant definition of USE_64BIT_BITSTR which can lead to core dumps.
 -- Move BatchScript to end of each job's information when using
    "scontrol -dd show job" to make it more readable.

-- Add SchedulerParameters configuration parameter of"default_gbytes", whichtreats numeric only (no suffix) value for memory and tmp disk spaceas being

    in units of Gigabytes. Mostly for compatability with LSF.
 -- Fix race condtion in srun/sattach logic which would prevent srun from
    terminating.
 -- Bitstring operations are now 64bit instead of 32bit.
 -- Replace hweight() function in bitstring with faster version.
 -- scancel would treat a non-numeric argument as the name of jobs to be

cancelled (a non-documented feature). Cancelling jobs by name nowrequire

    the "--jobname=" command line argument.

-- scancel modified to note that no jobs satisfy the filter optionswhen the--verbose option is used along with one or more job filters (e.g."--qos=").

 -- Change _pack_cred to use pack_bit_str_hex instead of pack_bit_fmt for
    better scalability and performance.

-- Add BootTime configuration parameter to knl.conf file to optimizeresource

    allocations with respect to required node reboots.
 -- Add node_features_p_boot_time() to node_features plugin to optimize
    scheduling with respect to node reboots.

-- Avoid allocating resources to a job in the event that its run timeplus boot

    time (if needed) extent into an advanced reservation.
 -- Burst_buffer/cray - Avoid stage-out operation if job never started.

-- node_features/knl_cray - Add capability to detected UncorrectableMemoryErrors (UME) and if detected then log the event in all job and stepstderr

    with a message of the form:

error: *** STEP 1.2 ON tux1 UNCORRECTABLE MEMORY ERROR AT2016-12-14T09:09:37 ***Similar logic added to node_features/knl_generic in version17.02.0pre4.-- If job is allocated nodes which are powered down, then reset jobstart time

    when the nodes are ready and do not charge the job for power up time.
 -- Add the ability to purge transactions from the database.
 -- Add support for requeue'ing of federated jobs (BETA).
 -- Add support for interactive federated jobs (BETA).
 -- Add the ability to purge rolled up usage from the database.

-- CRAY systems only: TaskPlugins must list task/cgroup beforetask/cray in

    order for the cgroup files to be created before task/cray runs.
 -- Properly set SLURM_JOB_GPUS environment variable for Prolog.

[slurm-dev] Slurm versions 16.05.9 and 17.02.0-0rc1 are now available

Reply via email to