Hello all, Regularly I'm seeing array jobs fail, and the only log info from the compute node is this:
[2023-04-11T11:41:12.336] error: /opt/slurm/prolog.sh: exited with status 0x0100 [2023-04-11T11:41:12.336] error: [job 26090] prolog failed status=1:0 [2023-04-11T11:41:12.336] Job 26090 already killed, do not launch batch job The contents of prolog.sh are incredibly simple: #!/bin/bash loginctl enable-linger $SLURM_JOB_USER I can't sort out what may be going on here. An example script from a job that can result in this error is here: #!/bin/bash #SBATCH -t 2:00:00 #SBATCH -n 1 #SBATCH -N 1 #SBATCH -p compute #SBATCH --array=1-100 #SBATCH -o tempOut/MSO-%j-%a.log module load python3/python3 python3 runVoltage.py $SLURM_ARRAY_TASK_ID Any insight would be welcome! This is really frustrating because it's constantly causing nodes to drain. Warmest regards, Jason -- *Jason L. Simms, Ph.D., M.P.H.* Manager of Research Computing Swarthmore College Information Technology Services (610) 328-8102 Schedule a meeting: https://calendly.com/jlsimms