[slurm-dev] RE: Suggestions on node memory cleaning

Felip Moll Fri, 07 Jul 2017 03:42:25 -0700

I do it in the epilog.

When entering the epilog for the last job on the node, I do a drop caches,
clean /dev/shm, etc.

Since epilog is run as root there is no need to use sudo. You could do that
also on the prolog, just check if there are other jobs running on the node.

br
Felip M

On 5 Apr 2017 1:56 a.m., "Michael Jennings" <m...@lanl.gov> wrote:

On Thursday, 30 March 2017, at 09:19:59 (-0700),
Chad Cropper wrote:

> Yes, I have seen these. Thanks. But the issue is I only want this to
> run when the node is empty. Our workload is very serial and most job
> sonly use 1,2,4 cores. So on a large node of 32 cores, we would have
> many jobs active. I only ever want this to run when no jobs exist on
> the node. My current best plan is to write python to check for jobs
> on a node, if empty set to drain, then ssh as root and run the
> command.

First, the problem with "sudo echo 3 > /proc/sys/vm/drop_caches" is
that bash (and other shells) handle redirection of input/output
*before* spawning the command, and the bash process (i.e., the user's
shell) does not have the root access required to write to that file.
You need a shell with root privileges to do that.

The cleanest solution is simply:

sudo /bin/bash -c 'echo 3 > /proc/sys/vm/drop_caches'

if you want something that works from an unprivileged command prompt
(assuming the user has sudo access to run bash as root).

Counting jobs on a node is simply a matter of:

squeue -h -w $HOSTNAME | wc -l

Putting it all together, you can simply do (either in a script run via
sudo, via "bash -c" as shown above, or under something like NHC:

[ `squeue -h -w $HOSTNAME | wc -l` -eq 0 ] && echo 3 >
/proc/sys/vm/drop_caches

(Note that you may need to tweak the above if, for example, your full
hostname as given by $HOSTNAME doesn't match your NodeName in SLURM.)

HTH,
Michael

--
Michael E. Jennings <m...@lanl.gov>
HPC Systems Team, Los Alamos National Laboratory
Bldg. 03-0200, Rm. 212      W: +1 (505) 606-0605

[slurm-dev] RE: Suggestions on node memory cleaning

Reply via email to