Hi everyone,
I'd like to configure slurm such that users can request an amount of disk
space for TMPDIR... and for that request to be reserved and quota'd via
commands like "sbatch --gres tmp:10G jobscript.sh". Probably reinventing
someone's wheel, but I'm almost there.
I have:
- created a local xfs filesystem, dedicated to per-job TMPDIR directories,
with project quotas enabled on each slurmd host.
- created (slurmd) Prolog/Epilog scripts which create/delete a per-job
directory on the xfs filesystem, owned by the job user.
- created SrunProlog/TaskProlog scripts, which set TMPDIR in the user's
job environment to point at the per-job directory.
- added a gres defined as "Name=tmp Flags=CountOnly"
- modified the node definitions to include the amount of storage on each
host, by adding "Gres=tmp:270G".
I still need to:
- extend the Prolog script to lookup the "tmp" gres allocation for the
job.
- extend the Prolog script to set the appropriate project quota on the
per-job TMPDIR, limiting the amount of space the directory tree can use.
Unfortunately, I've not found anything in the Prolog environment (or
stored on disk under /var/spool/slurmd) containing the gres allocations
for the job.
I figure I can do a "scontrol show job <jobid> -d" from inside the prolog
to get the job's gres information, but I'll need to hard-code the location
of the scontrol binary... and the Prolog documentation explicitly tells
you not to execute slurm commands from within the prolog.
Is there a better way to get the job's gres information from within the
prolog, please?
Thanks!
Mark