Davide,

Quick things to check:

 * Permissions on the file itself (and the directories in the path to it)
 * Existence of the script on the nodes (prologue is run on the nodes,
   not the head)

Not sure your error is the prologue script itself. Does everything run fine with no prologue configured?

Brian Andrus

On 9/15/2022 2:49 PM, Davide DelVento wrote:
I have a super simple prolog script, as follows (very similar to the
example one)

#!/bin/bash

if [[ $VAR == 1 ]]; then
         echo "True"
fi

exit 0

This fails (and obviously causes great disruption to my production
jobs). So I have two questions:

1. Why does it fail? It does so regardless of the value of the
variable, so it must not be the echo not being in the PATH (note that
[[ is a shell keyword). I understand that the echo command will go in
a black hole and I should use "print ..." (not sure about its syntax,
and the documentation is very cryptic, but I digress) or perhaps
logger (as the example does), and I tried some of them with no luck.

2. How to debug the issue? Even increasing the debug level the
slurmctld.log contains simply a "error: validate_node_specs: Prolog or
job env setup failure on node xxx, draining the node" message, without
even a line number or anything. Google does not return anything useful
about this message

3. And more generally, how to debug a prolog (and epilog) script
without disrupting all production jobs? Unfortunately we can't have
another slurm install for testing, is there a sbatch option to force
utilizing a prolog script which would not be executed for all the
other jobs? Or perhaps making a dedicated queue?

Reply via email to