Hello all,

Regularly I'm seeing array jobs fail, and the only log info from the
compute node is this:

[2023-04-11T11:41:12.336] error: /opt/slurm/prolog.sh: exited with status
0x0100
[2023-04-11T11:41:12.336] error: [job 26090] prolog failed status=1:0
[2023-04-11T11:41:12.336] Job 26090 already killed, do not launch batch job

The contents of prolog.sh are incredibly simple:

#!/bin/bash
loginctl enable-linger $SLURM_JOB_USER

I can't sort out what may be going on here. An example script from a job
that can result in this error is here:

#!/bin/bash
#SBATCH -t 2:00:00
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -p compute
#SBATCH --array=1-100
#SBATCH -o tempOut/MSO-%j-%a.log

module load python3/python3
python3 runVoltage.py $SLURM_ARRAY_TASK_ID

Any insight would be welcome! This is really frustrating because it's
constantly causing nodes to drain.

Warmest regards,
Jason

-- 
*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms

Reply via email to