Re: [slurm-users] slurm-20.02.1-1 failed rpmbuild with error File not found

2020-04-21 Thread Michael Jennings
They do something even better: They allow the user/customer to make the choice in the spec file! :-) And to be clear, they don't expect users to be experts in building packages; that's why their Quick-Start Guide (https://slurm.schedmd.com/quickstart_admin.html) is as thorough as it is; it even h

[slurm-users] Assigning gpu freq va;ues manually

2020-04-21 Thread Daniel Letai
Is it possible to assign gpu freq values without use of specialized plugin? Currently gpu freqs can be assigned by use of AutoDetect=nvml Or AutoDetect=rsmi In gres.conf, but I can't find any reference to assigning freq values manually

[slurm-users] Getting multiple steps to run out of interactive allocation with MPI processes.

2020-04-21 Thread Snedden, Ali
Hello, I am running SLURM 17.11 have a user who has a complicated workflow. The user wants 250 cores for 2 weeks to do some work semi-interactively. I'm not going to give the user a reservation to do this work, because the whole point of having a scheduler is to minimize human intervention in

[slurm-users] Getting multiple steps to run out of interactive allocation with MPI processes.

2020-04-21 Thread Snedden, Ali
Hello, I am running SLURM 17.11 have a user who has a complicated workflow. The user wants 250 cores for 2 weeks to do some work semi-interactively. I'm not going to give the user a reservation to do this work, because the whole point of having a scheduler is to minimize human intervention in

Re: [slurm-users] How to trap a SIGINT signal in a child process of a batch ?

2020-04-21 Thread Jeffrey T Frey
You could also choose to propagate the signal to the child process of test.slurm yourself: #!/bin/bash #SBATCH --job-name=test #SBATCH --ntasks-per-node=1 #SBATCH --nodes=1 #SBATCH --time=00:03:00 #SBATCH --signal=B:SIGINT@30 # This example works, but I need it to work without "B:" in --signal

Re: [slurm-users] How to trap a SIGINT signal in a child process of a batch ?

2020-04-21 Thread Bjørn-Helge Mevik
Jean-mathieu CHANTREIN writes: > But that is not enough, it is also necessary to use srun in > test.slurm, because the signals are sent to the child processes only > if they are also children in the JOB sense. Good to know! -- Cheers, Bjørn-Helge Mevik, dr. scient, Department for Research Comp

Re: [slurm-users] pam_slurm_adopt seems not working properly under "configless" slurm mode

2020-04-21 Thread Haoyang Liu
Thanks for the information! I'll take a look at the bug reports. Best regards, Haoyang > -原始邮件- > 发件人: "Ole Holm Nielsen" > 发送时间: 2020-04-21 16:44:09 (星期二) > 收件人: slurm-users@lists.schedmd.com > 抄送: > 主题: Re: [slurm-users] pam_slurm_adopt seems not working properly under > "configles

Re: [slurm-users] How to trap a SIGINT signal in a child process of a batch ?

2020-04-21 Thread Jean-mathieu CHANTREIN
- Mail original - > De: "b h mevik" > À: "slurm-users" > Envoyé: Mardi 21 Avril 2020 10:29:32 > Objet: Re: [slurm-users] How to trap a SIGINT signal in a child process of a > batch ? > Jean-mathieu CHANTREIN writes: > >> test.sh: >> >> #!/bin/bash >> >> function sig_handler() >> { >

Re: [slurm-users] pam_slurm_adopt seems not working properly under "configless" slurm mode

2020-04-21 Thread Ole Holm Nielsen
On 21-04-2020 04:58, Haoyang Liu wrote: I am setting up the latest slurm-20.02-1 on my clusters and trying to configure the "configless" slurm on the compute nodes. After following the instructions from https://slurm.schedmd.com/configless_slurm.html, both slurmctld and slurmd works fine. The

Re: [slurm-users] How to trap a SIGINT signal in a child process of a batch ?

2020-04-21 Thread Bjørn-Helge Mevik
Jean-mathieu CHANTREIN writes: > test.sh: > > #!/bin/bash > > function sig_handler() > { > echo "Executable interrupted" > exit 2 > } > > trap 'sig_handler' SIGINT > > echo "BEGIN" > sleep 200 > echo "END" Note that bash does not interrupt any running command (except "wait") when it re

[slurm-users] How to trap a SIGINT signal in a child process of a batch ?

2020-04-21 Thread Jean-mathieu CHANTREIN
Hello, I'm using slurm version 19.05.2 on debian 10. I'm try to hand a SIGINT signal by a child process of a batch. The signal is automatically send 30 s before the end of time. You can see this mechanism in this minimal example: --- test.slurm: #!/bin