In addition to the other suggestions, there's this:

https://slurm.schedmd.com/faq.html#tmpfs_jobcontainer
https://slurm.schedmd.com/job_container.conf.html

I would be interested in hearing how well it works - it's so buried in the documentation that unfortunately I didn't see it until after I rolled a solution similar to Diego's (which can be extended such that TaskProlog sets the TMPDIR environment variable appropriately, and limit the disk space used by the job).

All the best,

Mark

On Mon, 23 May 2022, Diego Zuccato wrote:

[EXTERNAL EMAIL]

Hi Arsene.

I did something like that some weeks ago.

I used the lines
Prolog=/home/conf/Prolog.sh
TaskProlog=/home/conf/TaskProlog.sh
Epilog=/home/conf/Epilog.sh

The scripts for prolog and epilog manage the creation (and permissions
assignment) of a directory in local storage (including the job ID, so
that different jobs don't get messed up).

TaskProlog script should export an environment variable but I couldn't
make it work :(
In your case, TaskProlog should copy the dataset to the local storage
and then you should add a TaskEpilog script to copy back the result. I
don't know if the TaskEpilog gets run for aborted jobs.

Moreover, IIRC you shouldn't do slow operations in task prolog or
epilog, so in your case a state machine implemented as a job array could
probably be better suited than TaskProlog/TaskEpilog (you'd need
Prolog/Epilog anyway): the first "job" copies to scratch, the second
does the number crunching and the third copies back the results.

HIH,
Diego

Il 23/05/2022 11:30, Arsene Marian Alain ha scritto:
 Dear SLURM users,

 I am IT Administrator of a small scientific computing center. We
 recently installed SLURM as a job scheduler on our Cluster and
 everything seems to be working fine. I just have a question about how to
 create temporary directories with SLURM.

 We use some programs for scientific calculation (such as Gromacs,
 Gaussian, NAMD, etc.). So, the process is the following:

 When we need to launch a calculation the first step is to copy all the
 necessary files from the local "$SLURM_SUBMIT_DIR" directory to the
 "/scratch" of the remote node, second step is to access the "/scratch"
 of the remote node and then run the program. Finally, when the program
 finishes we copy all the output files from the remote node's "/scratch"
 back to the local "$SLURM_SUBMIT_DIR" directory.

 So, is there any way to automatically generate a temporary directory
 inside the "/scratch" of the remote node?

 At the moment I am creating that directory manually as follows:

 "export HOMEDIR=$SLURM_SUBMIT_DIR

 export SCRATCHDIR=/scratch/job.$SLURM_JOB_ID.$USER

 export WORKDIR=$SCRATCHDIR

 mkdir -p $WORKDIR

 cp $HOMEDIR/* $WORKDIR

 cd $WORKDIR

 $NAMD/namd2 +idlepoll +p11 run_eq.namd > run_eq.log

 wait

 cp $WORKDIR/* $HOMEDIR"

 The main problem when you create the "/scratch" manually is that in case
 the calculation ends (successfully or unsuccessfully), users have to
 check the "/scratch" and remove the directory manually. I know I could
 include a line at the end of my script to delete that directory when the
 calculation is done, but I'm sure there must be a better way to do this.

 Thanks in advance for the help.

 best regards,

 Alain


--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786


Reply via email to