Re: [slurm-users] Using oversubscribe to hammer a node

2023-01-19 Thread Loris Bennett
Hi Rob, "Groner, Rob" writes: > I'm trying to setup a specific partition where users can fight with the OS > for dominance, The oversubscribe property sounds like what I want, as it says > "More than one job can execute simultaneously on the same compute resource." > That's exactly what I wa

Re: [slurm-users] Slurm - UnkillableStepProgram

2023-01-19 Thread Christopher Samuel
On 1/19/23 5:01 am, Stefan Staeglich wrote: Hi, Hiya, I'm wondering where the UnkillableStepProgram is actually executed. According to Mike it has to be available on every on the compute nodes. This makes sense only if it is executed there. That's right, it's only executed on compute nodes

[slurm-users] How to get allocated cpu cores for nodes in tres mode

2023-01-19 Thread Lu Weizheng
Hi all. In my site, I configure the cpu and gpu resources as TRES (https://slurm.schedmd.com/tres.html). Multiple jobs can co-run on the same node. The users want to know how many cores remain unallocated when they are submitting jobs. This can help them choose which partition to use. So is the

Re: [slurm-users] Job cancelled into the future

2023-01-19 Thread Reed Dier
Just to hopefully close this out, I believe I was actually able to resolve this in “user-land” rather than mucking with the database. I was able to requeue the bad jid’s, and they went pending. Then I updated the jobs to a time limit of 60. Then I scancelled the jobs, and they returned to a cance

[slurm-users] Using oversubscribe to hammer a node

2023-01-19 Thread Groner, Rob
I'm trying to setup a specific partition where users can fight with the OS for dominance, The oversubscribe property sounds like what I want, as it says "More than one job can execute simultaneously on the same compute resource." That's exactly what I want. I've setup a node with 48 CPU and o

Re: [slurm-users] Slurm - UnkillableStepProgram

2023-01-19 Thread Stefan Staeglich
Hi, I'm wondering where the UnkillableStepProgram is actually executed. According to Mike it has to be available on every on the compute nodes. This makes sense only if it is executed there. But the man page slurm.conf of 21.08.x states: UnkillableStepProgram Must be execut

Re: [slurm-users] srun jobfarming hassle question

2023-01-19 Thread Ohlerich, Martin
Helle Björn-Helge. Thank for reminding me /sys/fs for checking OOM issues. I lost that already out of sight again. In this case, there are more steps involved (one for each srun call). I'm not sure whether cgroup handles each separately, or just on a node-base. If the latter ... why do I have