[slurm-users] slurm Report

2020-09-24 Thread navin srivastava
Hi team, i have extracted the %utilization report and found that the idle time is at the higher end so wanted to check is there any way we can find the node based utilization? it will help us to figure out what are the nodes are unutilized. REgards navin.

Re: [slurm-users] Compiling Slurm with nvml support

2020-09-24 Thread Kilian Cavalotti
Hi Jason, We're taking the approach proposed in https://bugs.schedmd.com/show_bug.cgi?id=7919: same RPM everywhere, but without the dependencies that you don't want installed globally (like NVML, PMIx...). Of course you need to satisfy those dependencies some other way on the nodes that require

[slurm-users] Slurmd Stops responding with MAX_THREADS message logged

2020-09-24 Thread Grant Campbell
Hey, About once a day one or more Slurmd daemons running in our cluster stop accepting new jobs, and they only recover when Slurmd is restarted. The nodes are marked as "down", with the reason given as "not responding". We are running version 20.02.0. Right at the time this issue occurs the

[slurm-users] Features request

2020-09-24 Thread Relu Patrascu
Hello all, We're mostly a GPU compute shop, and we've been happy with slurm for the last three years, but we think slurm would benefit from the following two features: 1. Allow preemption in the same QOS, all else being equal, based on job priority. 2. Job size calculation to take into

Re: [slurm-users] Compiling Slurm with nvml support

2020-09-24 Thread Paul Edmon
That's what we do here.  We have three different rpms we build. server: because we run the latest MariaDB on our master general compute gpu compute: because we build against nvml We name these all the same but have them in different repos and distribute the repos to each node appropriately.

[slurm-users] Compiling Slurm with nvml support

2020-09-24 Thread Dana, Jason T.
Hello, I hopefully have a quick question. I have compiled Slurm RPMs on a CentOS system with nvidia drivers installed so that I can utilize AutoDetect=nvml configuration in our GPU nodes’ gres.conf. All seems to be going well on the GPU nodes since I have done that. I was unable to install

[slurm-users] SLURM reservations with MAGNETIC flag

2020-09-24 Thread Bas van der Vlies
We have installed slurm 20.02.5 and I am trying to use this new reservation flag MAGNETIC: * https://slurm.schedmd.com/reservations.html From this page I understand that the job will land in the reservation even if we did not specify the reservation name. I tested it on our cluster setup but

Re: [slurm-users] How to set association factor in Multifactor Priority

2020-09-24 Thread Marcus Boden
Hi Jianwen, yes, you can give different accounts or users specific extra-priorities. You can set it via sacctmgr: https://slurm.schedmd.com/sacctmgr.html#SECTION_GENERAL-SPECIFICATIONS-FOR-ASSOCIATION-BASED-ENTITIES (scroll down to 'Priority') Priority What priority will be added to a job's