Re: [slurm-users] Odd prolog Error?

2023-04-11 Thread Jason Simms
Brian, your prompt about the user not being present on the node was what I needed. To close the loop on this, the error was due to an expired vendor SSL cert for LDAP. This was causing sssd on the nodes to balk. Once patched, all is well again. Thanks, Jason On Tue, Apr 11, 2023 at 1:28 PM Jason

[slurm-users] Salloc expand feature

2023-04-11 Thread abel pinto
Hi —, I have a question about a silent feature removal. It is about the --dependency:expand feature, that was present in Slurm for 10 years until its removal in version 21.08.03. Until Slurm 21.08.02, the expand option had an extensive documentation with the dynamic job elasticity features, wi

Re: [slurm-users] Odd prolog Error?

2023-04-11 Thread Jason Simms
Thanks, Brian, helpful as always. Yes, /opt/slurm/prolog.sh is mounted across IB on all nodes, so it's reachable from everywhere. And the slurmd user can execute it. I'll keep mucking around with it... Warmest regards, Jason On Tue, Apr 11, 2023 at 12:57 PM Brian Andrus wrote: > From the docum

Re: [slurm-users] Odd prolog Error?

2023-04-11 Thread Brian Andrus
From the documentation: *Parameter* *Location* *Invoked by* *User* *When executed* Prolog (from slurm.conf) Compute or front end node slurmd daemon SlurmdUser (normally user root) First job or job step initi

[slurm-users] Odd prolog Error?

2023-04-11 Thread Jason Simms
Hello all, Regularly I'm seeing array jobs fail, and the only log info from the compute node is this: [2023-04-11T11:41:12.336] error: /opt/slurm/prolog.sh: exited with status 0x0100 [2023-04-11T11:41:12.336] error: [job 26090] prolog failed status=1:0 [2023-04-11T11:41:12.336] Job 26090 already