Has anyone seen this error - slurmdbd will not start:
2016-12-01T10:07:54.946] debug: slurmdbd: slurm_open_msg_conn to uahpc:6819:
Connection refused
[2016-12-01T10:07:54.946] error: slurmdbd: DBD_SEND_MULT_JOB_START failure:
Connection refused
This was a running system and we just pushed out
that
has enough and pre-empts whatever is running there.
Deborah Crocker, PhD
Systems Engineer III
Office of Information Technology
The University of Alabama
Box 870346
Tuscaloosa, AL 36587
Office 205-348-3758 | Fax 205-348-9393
deborah.croc...@ua.edu
From: Crocker, Deborah [mailto:cr...@ua.edu
First off I’d like to apologize to the list. I sent this by grabbing an older
message and forgot to change the subject line. Here is my question with a
better subject.
I should add that I’ve experimented some with using LLN to get jobs to go to
other nodes, of which many are free, instead of cr
We have our basic slurm setup running with a couple of queues set up through
QOS (a main and long “queue”). What we would like to do now is set up for
pre-emption for our stakeholders. When we have preemption set on the partition
(suspend), though, it looks like any job with –exclusive set will
I have a question about the terminology for "groups" and "qos". It seems likes
"groups" are supposed to refer to OS values (/etc/group) based on reading of
the slurm.conf attributes. However, there is an example online (PDF named
Basic_Configuration_Usage. by R. Schultz) discussing preemption wh