Davide DelVento <davide.quan...@gmail.com> writes: > Does it need the execution permission? For root alone sufficient?
slurmd runs as root, so it only need exec perms for root. >> > 2. How to debug the issue? >> I'd try capturing all stdout and stderr from the script into a file on the >> compute >> node, for instance like this: >> >> exec &> /root/prolog_slurmd.$$ >> set -x # To print out all commands > > Do you mean INSIDE the prologue script itself? Yes, inside the prolog script itself. > Yes, this is what I'd have done, if it weren't so disruptive of all my > production jobs, hence I had to turn it off before wrecking havoc too > much. I'm curious: What kind of disruption did it cause for your production jobs? We use this in our slurmd prologs (and similar in epilogs) on all our production clusters, and have not seen any disruption due to it. (We do have things like ## Remove log file if we got this far: rm -f /root/prolog_slurmd.$$ at the bottom of the scripts, though, so as to remove the log file when the prolog succeeded.) > Sure, but even "just executing" there is stdout and stderr which could > be captured and logged rather than thrown away and force one to do the > above. True. But slurmd doesn't, so... > How do you "install the prolog scripts there"? Isn't the prolog > setting in slurm.conf global? I just overwrite the prolog script file itself on the node. We don't have them on a shared file system, though. If you have the prologs on a shared file system, you'd have to override the slurm config on the compute node itself. This can be done in several ways, for instance by starting slurmd with the "-f <modified slurm.conf file>" option. >> (Otherwise one could always >> set up a small cluster of VMs and use that for simpler testing.) > > Yes, but I need to request that cluster of VM to IT, have the same OS > installed and configured (and to be 100% identical, it needs to be > RHEL so license paid), and everything sync'ed with the actual > cluster.... I know it'd be very useful, but sadly we don't have the > resources to do that, so unfortunately this is not an option for me. I totally agree that VMs instead of a physical test cluster is never going to be 100 % the same, but some things can be tested even though the setups are not exactly the same (for instance, in my experience, CentOS and Rocky are close enough to RHEL for most slurm-related things). One takes what one have. :) -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo
signature.asc
Description: PGP signature