Paul Raines <rai...@nmr.mgh.harvard.edu> writes: > Basically, it appears using --mem-per-gpu instead of just --mem gives > you unlimited memory for your job. > > $ srun --account=sysadm -p rtx8000 -N 1 --time=1-10:00:00 > --ntasks-per-node=1 --cpus-per-task=1 --gpus=1 --mem-per-gpu=8G > --mail-type=FAIL --pty /bin/bash > rtx-07[0]:~$ find /sys/fs/cgroup/memory/ -name job_$SLURM_JOBID > /sys/fs/cgroup/memory/slurm/uid_5829/job_1134067 > rtx-07[0]:~$ cat > /sys/fs/cgroup/memory/slurm/uid_5829/job_1134067/memory.limit_in_bytes > 1621419360256 > > That is a limit of 1.5TB which is all the memory on rtx-07, not > the 8G I effectively asked for at 1 GPU and 8G per GPU.
Which version of Slurm is this? We noticed a behaviour similar to this on Slurm 20.11.8, but when we tested it on 21.08.1, we couldn't reproduce it. (We also noticed an issue with --gpus-per-task that appears to have been fixed in 21.08.) -- B/H
signature.asc
Description: PGP signature