Hello,

Could someone in the slurm community please advise me on outputting data (job 
stats) at the end of a job. I'm currently using the Epilog (slurm.epilog.clean) 
to print out a report at the end of each slurm job. This works, however it is 
far from ideal. The slurm.epilog.clean can be executed multiple times and so I 
do a test to find out when I'm on the "batchhost" and then write out the job 
stats. That is...

hostnode=`hostname`
batchhost=`/local/software/slurm/default/bin/scontrol show job ${SLURM_JOB_ID} 
| grep BatchHost | cut -f2 -d '='`
if [ $hostnode = $batchhost ] ; then
     printf "Submit time  : `/local/software/slurm/default/bin/scontrol show 
job ${SLURM_JOB_ID}| grep SubmitTime | awk '{print $1}' | cut -f2 -d"="`\n" 
>>$stdout
    etc
fi

This does work, however it strikes me that using the EpilogSlurmctld might be 
better. The issue, of course, is that the EpilogSlurmctld is executed by the 
slurm user, and so how can this script be made to write to the stdout file of a 
job?

I am, by the way, grep'ing  the output of the scontrol (for submit time, etc) 
and the sacct (for memory usage, etc) commands to generate my report.  Does 
this approach make sense or are there better alternatives. Here's an example of 
the data printed out by my epilog script....

Submit time  : 2016-10-12T09:47:03
Start time   : 2016-10-12T09:47:03
End time     : 2016-10-12T09:47:17
Elapsed time : 00:00:14 (Timelimit=02:00:00)

   JobName     MaxRSS    Elapsed
   ----------       ----------    ----------
 slurm.mpi                         00:00:14
          batch         1244K    00:00:14
               cpi      28224K     00:00:01

Best regards,
David



Reply via email to