Davide DelVento <davide.quan...@gmail.com> writes:

>> I'm curious: What kind of disruption did it cause for your production
>> jobs?
>
> All jobs failed and went in pending/held with "launch failed requeued
> held" status, all nodes where the jobs were scheduled went draining.
>
> The logs only said "error: validate_node_specs: Prolog or job env
> setup failure on node xxxx, draining the node". I guess if they said
> "-bash: /path/to/prolog: Permission denied" I would have caught the
> problem myself.

But that is not a problem caused by having things like

exec &> /root/prolog_slurmd.$$

in the script, as you indicated.  It is a problem caused by the prolog
script file not being executable.

> In hindsight it is obvious, but I don't think even the documentation
> mentions that, does it? After all you can execute a file with a
> non-executable with with "sh filename", so I made the incorrect
> assumption that slurm would have invoked the prolog that way.

Slurm prologs can be written in any language - we used to have perl
prolog scripts. :)

-- 
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo

Attachment: signature.asc
Description: PGP signature

Reply via email to