Hi everyone,

I'd like to configure slurm such that users can request an amount of disk space for TMPDIR... and for that request to be reserved and quota'd via commands like "sbatch --gres tmp:10G jobscript.sh". Probably reinventing someone's wheel, but I'm almost there.

I have:

- created a local xfs filesystem, dedicated to per-job TMPDIR directories,
  with project quotas enabled on each slurmd host.

- created (slurmd) Prolog/Epilog scripts which create/delete a per-job
  directory on the xfs filesystem, owned by the job user.

- created SrunProlog/TaskProlog scripts, which set TMPDIR in the user's
  job environment to point at the per-job directory.

- added a gres defined as "Name=tmp Flags=CountOnly"

- modified the node definitions to include the amount of storage on each
  host, by adding "Gres=tmp:270G".

I still need to:

- extend the Prolog script to lookup the "tmp" gres allocation for the
  job.

- extend the Prolog script to set the appropriate project quota on the
  per-job TMPDIR, limiting the amount of space the directory tree can use.


Unfortunately, I've not found anything in the Prolog environment (or stored on disk under /var/spool/slurmd) containing the gres allocations for the job.

I figure I can do a "scontrol show job <jobid> -d" from inside the prolog to get the job's gres information, but I'll need to hard-code the location of the scontrol binary... and the Prolog documentation explicitly tells you not to execute slurm commands from within the prolog.

Is there a better way to get the job's gres information from within the prolog, please?

Thanks!

Mark

Reply via email to