Using Slurm 20.02 on CentIOS 7.7 with Bright Cluster. We changed the
following options to enable MPS:
SelectType=select/cons_tres
GresTypes=gpu,mic,mps

I restarted slurmctld and ran scontrol reconfigure, however all jobs get
the below error:
[2020-04-07T15:29:00.741] debug:  backfill: no jobs to backfill
[2020-04-07T15:29:03.051] Resending TERMINATE_JOB request JobId=3056
Nodelist=node[001-002]
[2020-04-07T15:29:03.051] Resending TERMINATE_JOB request JobId=3061
Nodelist=node003
[2020-04-07T15:29:03.051] debug:  sched: Running job scheduler
[2020-04-07T15:29:03.063] agent/is_node_resp: node:node003
RPC:REQUEST_TERMINATE_JOB : Header lengths are longer than data received
[2020-04-07T15:29:03.071] agent/is_node_resp: node:node002
RPC:REQUEST_TERMINATE_JOB : Header lengths are longer than data received
[2020-04-07T15:29:03.071] agent/is_node_resp: node:node001
RPC:REQUEST_TERMINATE_JOB : Header lengths are longer than data received

Do any other options need changing? What causes these header length errors?

Reply via email to