Source: slurm-wlm-contrib Version: 22.05.8-4+deb12u1 Severity: critical Justification: breaks the whole system
Dear Maintainer, After latest security update, part of our slurm cluster (GPU nodes) was unusable. These nodes were configured using the NVML autodetect feature of slurm. After the deb12u2 update, the NVML plugins failed to install because there is no corresponding security update: The following packages have unmet dependencies: slurm-wlm-nvml-plugin : Depends: slurm-wlm-basic-plugins (= 22.05.8-4+deb12u1) but 22.05.8-4+deb12u2 is to be installed slurm-wlm-nvml-plugin-dev : Depends: slurm-wlm-basic-plugins-dev (= 22.05.8-4+deb12u1) but 22.05.8-4+deb12u2 is to be installed E: Unable to correct problems, you have held broken packages. Without NVML, the slurmd daemon will not start, so no new jobs could be submitted to ALL of our GPU nodes. We discovered that slurm-wlm-contrib had been removed from testing in Dec 2023 but with no bug reports or explanation as to why. We have gotten around the issue by reconfiguring our GPU nodes to manual configuration and removed the NVML packages for now. However this was quite impactful and unexpected in our environment. -- System Information: Debian Release: 12.4 APT prefers stable-security APT policy: (750, 'stable-security'), (750, 'stable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 6.1.0-15-amd64 (SMP w/96 CPU threads; PREEMPT) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled