Re: [slurm-users] Requirement of one GPU job should run in GPU nodes in a cluster

2021-12-16 Thread Steffen Grunewald
On Fri, 2021-12-17 at 13:03:32 +0530, Sudeep Narayan Banerjee wrote: > Hello All: Can we please restrict one GPU job on one GPU node? > > That is, > a) when we submit a GPU job on an empty node (say gpu2) requesting 16 cores > as that gives the best performance in the GPU and it gives best perform

[slurm-users] Requirement of one GPU job should run in GPU nodes in a cluster

2021-12-16 Thread Sudeep Narayan Banerjee
Hello All: Can we please restrict one GPU job on one GPU node? That is, a) when we submit a GPU job on an empty node (say gpu2) requesting 16 cores as that gives the best performance in the GPU and it gives best performance. b) Then another user flooded the CPU cores on gpu2 sharing the GPU resour

Re: [slurm-users] QOS time limit tighter than partition limit

2021-12-16 Thread Fulcomer, Samuel
...and you shouldn't be able to do this with a QoS (I think as you want it to), as "grptresrunmins" applies to the aggregate of everything using the QoS. On Thu, Dec 16, 2021 at 6:12 PM Fulcomer, Samuel wrote: > I've not parsed your message very far, but... > > for i in `cat limit_users` ; do >

Re: [slurm-users] QOS time limit tighter than partition limit

2021-12-16 Thread Fulcomer, Samuel
I've not parsed your message very far, but... for i in `cat limit_users` ; do sacctmgr where user=$i partition=foo account=bar set grptresrunmins=cpu=Nlimit On Thu, Dec 16, 2021 at 6:01 PM Ross Dickson wrote: > It would like to impose a time limit stricter than the partition limit on > a certa

[slurm-users] QOS time limit tighter than partition limit

2021-12-16 Thread Ross Dickson
It would like to impose a time limit stricter than the partition limit on a certain subset of users. I should be able to do this with a QOS, but I can't get it to work. What am I missing? At https://slurm.schedmd.com/resource_limits.html it says, "Slurm's hierarchical limits are enforced in the

Re: [slurm-users] Prevent users from updating their jobs

2021-12-16 Thread Fulcomer, Samuel
There's no clear answer to this. It depends a bit on how you've segregated your resources. In our environment, GPU and bigmem nodes are in their own partitions. There's nothing to prevent a user from specifying a list of potential partitions in the job submission, so there would be no need for the

Re: [slurm-users] Prevent users from updating their jobs

2021-12-16 Thread Bill Wichser
Indeed. We use this and BELIEVE that it works, lol! Bill function slurm_job_modify ( job_desc, job_rec, part_list, modify_uid ) if modify_uid == 0 then return 0 end if job_desc.qos ~= nil then return 1 end return 0 end On

Re: [slurm-users] Prevent users from updating their jobs

2021-12-16 Thread Carlos Fenoy
As far a I remember you can use the job_submit lua plugin to prevent any change on the jobs On Thu, 16 Dec 2021 at 21:47, Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) wrote: > Is there a meaningful difference between using "scontrol update" and just > killing the job and resubmitting w

Re: [slurm-users] Prevent users from updating their jobs

2021-12-16 Thread Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
Is there a meaningful difference between using "scontrol update" and just killing the job and resubmitting with those resources already requested?

[slurm-users] Prevent users from updating their jobs

2021-12-16 Thread Jordi Blasco
Hi everyone, I was wondering if there is a way to prevent users from updating their jobs with "scontrol update job". Here is the justification. A hypothetical user submits a job requesting a regular node, but he/she realises that the large memory nodes or the GPU nodes are idle. Using the previo

Re: [slurm-users] work with sensitive data

2021-12-16 Thread Josef Dvoracek
> One of the open problems is a way to provide the password for mounting the encrypted directory inside a slurm-job. But this should be solvable. I'd be really interested to hear more about the mechanism to distribute credentials across compute nodes in secure way, especially if we're using f

[slurm-users] slurm and kerberized NFSv4 - current perspective?

2021-12-16 Thread Josef Dvoracek
@list, is here any experience with recent versions of Slurm and kerberized NFS at compute nodes? I saw older (~201x) tutorial and slidedecks describing auks, but after checking the its github project I feel like it is non-mainstream solution. Is my understanding correct that using kerberize