Re: [slurm-users] scontrol for a heterogenous job appears incorrect

2019-04-24 Thread Jeffrey R. Lang
Chris Upon further testing this morning I see the job is assigned two different jobid's, something I wasn't expecting. This lead me down the road of thinking the output was incorrect. Scontrol on a hetro job will show multi-jobids for the job. So, the output just wasn't what I was

Re: [slurm-users] Limit concurrent gpu resources

2019-04-24 Thread Renfro, Michael
We put a ‘gpu’ QOS on all our GPU partitions, and limit jobs per user to 8 (our GPU capacity) via MaxJobsPerUser. Extra jobs get blocked, allowing other users to queue jobs ahead of the extras. # sacctmgr show qos gpu format=name,maxjobspu Name MaxJobsPU -- - gpu

[slurm-users] Limit concurrent gpu resources

2019-04-24 Thread Mike Cammilleri
Hi everyone, We have a single node with 8 gpus. Users often pile up lots of pending jobs and are using all 8 at the same time, but for a user who just wants to do a short run debug job and needs one of the gpus, they are having to wait too long for a gpu to free up. Is there a way with

Re: [slurm-users] Effect of PriorityMaxAge on job throughput

2019-04-24 Thread David Baker
Hello Michael, Thank you for your email and apologies for my tardy response. I'm still sorting out my mailbox after an Easter break. I've taken your comments on board and I'll see how I go with your suggestions. Best regards, David From: slurm-users on behalf

Re: [slurm-users] Job dispatching policy

2019-04-24 Thread Mahmood Naderan
Thanks for the info. Thing is that I don't want to totally set the node as unhealthy. Assume the following scenarios: compute-0-0 running slurm jobs and system load is 15 (32 cores) compute-0-1 running non-slurm jobs and system load is 25 (32 cores) Then a new slurm job should be dispatched to