Brian, your prompt about the user not being present on the node was what I
needed. To close the loop on this, the error was due to an expired vendor
SSL cert for LDAP. This was causing sssd on the nodes to balk. Once
patched, all is well again.
Thanks,
Jason
On Tue, Apr 11, 2023 at 1:28 PM Jason
Hi —,
I have a question about a silent feature removal. It is about the
--dependency:expand feature, that was present in Slurm for 10 years until its
removal in version 21.08.03.
Until Slurm 21.08.02, the expand option had an extensive documentation with the
dynamic job elasticity features, wi
Thanks, Brian, helpful as always. Yes, /opt/slurm/prolog.sh is mounted
across IB on all nodes, so it's reachable from everywhere. And the slurmd
user can execute it.
I'll keep mucking around with it...
Warmest regards,
Jason
On Tue, Apr 11, 2023 at 12:57 PM Brian Andrus wrote:
> From the docum
From the documentation:
*Parameter*
*Location*
*Invoked by*
*User*
*When executed*
Prolog (from slurm.conf)
Compute or front end node
slurmd daemon
SlurmdUser (normally user root)
First job or job step initi
Hello all,
Regularly I'm seeing array jobs fail, and the only log info from the
compute node is this:
[2023-04-11T11:41:12.336] error: /opt/slurm/prolog.sh: exited with status
0x0100
[2023-04-11T11:41:12.336] error: [job 26090] prolog failed status=1:0
[2023-04-11T11:41:12.336] Job 26090 already