Re: [slurm-users] sacct does always print all jobs regardless filter parameters with accounting_storage/filetxt

2020-01-31 Thread Chris Samuel
On 30/1/20 10:20 am, Dr. Thomas Orgis wrote: Matching for user (-u) and Job ID (-j) works, but not -N/-S/-E. So is this just the current state and it's up to me to provide a patch to enable it if I want that behaviour? You're using a very very very old version of slurm there (15.08), you shou

Re: [slurm-users] Longer queuing times for larger jobs

2020-01-31 Thread Renfro, Michael
Slurm 19.05 now, though all these settings were in effect on 17.02 until quite recently. If I get some detail wrong below, I hope someone will correct me. But this is our current working state. We’ve been able to schedule 10-20k jobs per month since late 2017, and we successfully scheduled 320k

Re: [slurm-users] Longer queuing times for larger jobs

2020-01-31 Thread David Baker
Hello, Thank you for your detailed reply. That’s all very useful. I manage to mistype our cluster size since there are actually 450 standard compute, 40 core, compute nodes. What you say is interesting and so it concerns me that things are so bad at the moment, I wondered if you could please g

Re: [slurm-users] Longer queuing times for larger jobs

2020-01-31 Thread Renfro, Michael
I missed reading what size your cluster was at first, but found it on a second read. Our cluster and typical maximum job size scales about the same way, though (our users’ typical job size is anywhere from a few cores up to 10% of our core count). There are several recommendations to separate y

Re: [slurm-users] Longer queuing times for larger jobs

2020-01-31 Thread David Baker
Hello, Thank you for your reply. in answer to Mike's questions... Our serial partition nodes are partially shared by the high memory partition. That is, the partitions overlap partially -- shared nodes move one way or another depending upon demand. Jobs requesting up to and including 20 cores a

Re: [slurm-users] How do I add a library for the linker in Makefile.in

2020-01-31 Thread Michael Gutteridge
With the caveat that I haven't built these plugins past Slurm 18, these are job submit plugins, and that the documentation is weak, you could look at these plugins I'd written for our cluster: https://github.com/FredHutch/gizmo-plugins Contains two plugins I build in the source tree. These set a

Re: [slurm-users] Longer queuing times for larger jobs

2020-01-31 Thread Renfro, Michael
Greetings, fellow general university resource administrator. Couple things come to mind from my experience: 1) does your serial partition share nodes with the other non-serial partitions? 2) what’s your maximum job time allowed, for serial (if the previous answer was “yes”) and non-serial parti

Re: [slurm-users] Longer queuing times for larger jobs

2020-01-31 Thread Loris Bennett
Hi David, David Baker writes: > Hello, > > Our SLURM cluster is relatively small. We have 350 standard compute > nodes each with 40 cores. The largest job that users can run on the > partition is one requesting 32 nodes. Our cluster is a general > university research resource and so there are man

[slurm-users] Longer queuing times for larger jobs

2020-01-31 Thread David Baker
Hello, Our SLURM cluster is relatively small. We have 350 standard compute nodes each with 40 cores. The largest job that users can run on the partition is one requesting 32 nodes. Our cluster is a general university research resource and so there are many different sizes of jobs ranging from