Bill, you know this already. But permit me an observation from PPBpro. Turn up the logging level to maximum on the nodes. Tail the slurm log and start a job. Look HARD at exactly what the log is telling you - and as Richard Feynman says you are the easiest person to fool. Dont take the log to say what you think is happening - remember that log messages take effort to put in the code, well at least some keystrokes, so they usually mean something!
On Tue, 16 Oct 2018 at 10:04, John Hearns <hear...@googlemail.com> wrote: > Rather dumb question from me - you have checked those processes are > running within a cgroup? > I have no experience in constraining the swap usage using cgroups, so > sorry if I am adding nothing to the debate here. > > On Tue, 16 Oct 2018 at 04:49, Bill Broadley <b...@cse.ucdavis.edu> wrote: > >> >> Greetings, >> >> I'm using ubuntu-18.04 and slurm-18.08.1 compiled from source. >> >> I followed the directions on: >> https://slurm.schedmd.com/cgroups.html >> >> And: >> https://slurm.schedmd.com/cgroup.conf.html >> >> That resulted in: >> $ cat slurm.conf | egrep -i "cgroup|CR_" >> ProctrackType=proctrack/cgroup >> TaskPlugin=task/cgroup >> SelectTypeParameters=CR_CPU_MEMORY >> JobAcctGatherType=jobacct_gather/cgroup >> >> $ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX= >> GRUB_CMDLINE_LINUX='cgroup_enable=memory swapaccount=1 console=tty0 >> transparent_hugepage=madvise console=ttyS0,57600' >> >> $ cat cgroup.conf >> CgroupAutomount=yes >> ConstrainCores=yes >> ConstrainDevices=yes >> ConstrainRAMSpace=yes >> ConstrainSwapSpace=yes >> MaxSwapPercent=0 >> AllowedSwapSpace=0 >> >> So I expect jobs to not use swap. Turns out if I run a 3GB ram process >> with >> sbatch --mem=1000 I just get a process that uses 1GB ram and 2GB of swap. >> >> So a 3GB process with --mem=1000: >> $ ps acux >> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >> bill 17698 11.1 1.5 2817020 1015392 ? D 20:40 0:13 stream\ >> >> $ smem >> User Count Swap USS PSS RSS >> bill 1 1795552 1017048 1017076 1018492 >> >> With --mem=3000 zero swap is used and the job consumes 100% of a CPU. >> >>