Hi David,
We've implemented a very similar setup to you, using "scontrol show job" and
looking for the StdOut file. Not very elegant (spank plugins might be the
better option) but it works for us.
A couple of small points:
- we use EpilogSlurmctld and not Epilog; the Epilog will run on each compute
node, so may not be the best fit for a singleton task like writing job stats
(and might explain why you get two Hello World entries in your output file).
The EpilogSlurmctld runs once (on the headnode only).
- it's probably best to leave the slurm.epilog.clean file unmodified, as it is a
standard file in slurm and could be changed with an upgrade; so you'd have to
manually reproduce your edits to it (we've been bitten by that with rpm
upgrades). Instead we've started to configure a completely new file as the
Epilog (Epilog=.../slurm.epilog.local) for performing custom tasks (post-job
node health checks in our case), but it explicitly calls the
slurm.epilog.clean as well at the end, as that script cleans up user
processes. But anyhow I guess that's moot if you instead use EpilogSlurmctld
for your post-job stats writing. :)
Hope that helps.
Kind regards,
Paddy
On Mon, Jun 27, 2016 at 01:03:47AM -0700, Baker D.J. wrote:
> Hi Lyn,
>
> Thank you for your reply to my question. I???ve been exploring and
> experimenting with the prolog and epilog today. So for example, I???ve put
> the following piece of code in my ???final??? epilog, epilog.clean???
>
> stdout=`/local/software/slurm/default/bin/scontrol show job ${SLURM_JOB_ID} |
> grep -i stdout | cut -f2 -d '='`
> #stdout=/local/software/slurm/default/etc/output
> echo 'Hello World from Epilog.clean' >>$stdout
>
> slurm.conf entry???
>
> Epilog=/local/software/slurm/default/etc/slurm.epilog.clean
>
> This does the job. There may be a more elegant way to do things, however this
> does work. I do, however, get the ???Hello World??? statement written twice
> in my job output. I assume that this final epilog will be executed once at
> the end of the job. Do you or anyone else on the forum understand why the
> statement is echo???ed twice?
>
> ???.
> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
> wall clock time = 0.883930
> Hello World from Epilog.clean
> Hello World from Epilog.clean
> ???.
>
> Best regards,
> David
>
> From: Lyn Gerner [mailto:[email protected]]
> Sent: Thursday, June 23, 2016 6:56 PM
> To: slurm-dev <[email protected]>
> Subject: [slurm-dev] Re: Writing to job output files from prolog and epilog
> scripts
>
> Hi David,
>
> Be sure to note the special methods for setting env variables and writing to
> stdout from the task prolog, in the Prolog and Epilog Guide web page. In
> order to write job summary info (during one of your epilogs), you can acquire
> the stdout location from scontrol show job ${SLURM_JOB_ID} and just echo to
> it.
>
> Regards,
> Lyn
>
> On Thu, Jun 23, 2016 at 2:19 AM, Baker D.J.
> <[email protected]<mailto:[email protected]>> wrote:
> Hello,
>
> I???m sure that this question has been asked before, however I don???t recall
> finding a satisfactory answer to this question. We are investigating moving
> from torque/moab to slurm on our HPC clusters. In our torque prologue and
> epilogue scripts we write information in to users??? job output files. For
> example where the job executed on the cluster (on which compute nodes) and
> how many (much) resources the job used.
>
> I???ve set up some prototype slurm prolog and epilog scripts and included
> some write (echo) statements, however I don???t see any of the information in
> job output files. Is writing information in to output files much more tricky
> in slurm or have I missed something fundamental? Alternatively, are there
> other ways and means of doing this? Could someone please advise me.
>
> Best regards,
> David
>
--
Paddy Doyle
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
Phone: +353-1-896-3725
http://www.tchpc.tcd.ie/