[slurm-users] Re: Redirect jobs submitted to old partition to new

2024-04-16 Thread Williams, Jenny Avis via slurm-users
For jobs already in default_queue squeue -t pd -h --Format=jobID |xargs -L1 -I{} scontrol update jobID={} partition=queue1 What version of slurm are you running? In slurm 23.02.5, man slurm.conf under PARTITION CONFIGURATION Alternate Partition name of alternate

[slurm-users] Redirect jobs submitted to old partition to new

2024-04-16 Thread wdennis--- via slurm-users
Hi all, I have a single-partition Slurm cluster (the single partition name being "default_queue") that I now want to implement multiple different queues on to subdivide the resources. Say the new default queue is "queue1"; Should I set the "default_queue" to `State=INACTIVE` and then use

[slurm-users] Slurm version 23.11.6 is now available

2024-04-16 Thread Marshall Garey via slurm-users
We are pleased to announce the availability of Slurm version 23.11.6. The 23.11.6 release includes two different problems with the priority/multifactor plugin: a crash and a miscalculation of AssocGrpCPURunMinutes after a slurmctld reconfiguration/restart. The wsrep_on errors that sites running

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Jason Simms via slurm-users
As a related point, for this reason I mount /var/log separately from /. Ask me how I learned that lesson... Jason On Tue, Apr 16, 2024 at 8:43 AM Jeffrey T Frey via slurm-users < slurm-users@lists.schedmd.com> wrote: > AFAIK, the fs.file-max limit is a node-wide limit, whereas "ulimit -n" > is

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Jeffrey T Frey via slurm-users
> AFAIK, the fs.file-max limit is a node-wide limit, whereas "ulimit -n" > is per user. The ulimit is a frontend to rusage limits, which are per-process restrictions (not per-user). The fs.file-max is the kernel's limit on how many file descriptors can be open in aggregate. You'd have to edit

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Bjørn-Helge Mevik via slurm-users
Ole Holm Nielsen writes: > Hi Bjørn-Helge, > > That sounds interesting, but which limit might affect the kernel's > fs.file-max? For example, a user already has a narrow limit: > > ulimit -n > 1024 AFAIK, the fs.file-max limit is a node-wide limit, whereas "ulimit -n" is per user. Now that I

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Ole Holm Nielsen via slurm-users
Hi Bjørn-Helge, On 4/16/24 12:08, Bjørn-Helge Mevik via slurm-users wrote: Ole Holm Nielsen via slurm-users writes: Therefore I believe that the root cause of the present issue is user applications opening a lot of files on our 96-core nodes, and we need to increase fs.file-max. You could

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Bjørn-Helge Mevik via slurm-users
Ole Holm Nielsen via slurm-users writes: > Therefore I believe that the root cause of the present issue is user > applications opening a lot of files on our 96-core nodes, and we need > to increase fs.file-max. You could also set a limit per user, for instance in /etc/security/limits.d/. Then

[slurm-users] Re: Munge log-file fills up the file system to 100%

2024-04-16 Thread Ole Holm Nielsen via slurm-users
Hi Jeffrey, Thanks a lot for the information: On 4/15/24 15:40, Jeffrey T Frey wrote: https://github.com/dun/munge/issues/94 I hadn't seen issue #94 before, and it seems to be relevant to our problem. It's probably a good idea to upgrade munge beyond what's supplied by EL8/EL9. We can