[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'

Loris Bennett Tue, 12 Sep 2017 22:37:23 -0700

Hi Lyn,

Unfortunately, rebooting the node makes no difference to the state of
the node.  The job gets re-queued and the node goes back to 'mix~'.
What baffles me is that there is obviously some sort of communication
problem between the slurmctld on the admin node and the slurmd on the
compute node, but I can't find anything in the log files to indicate
what's going wrong.


Cheers,

Loris

Lyn Gerner <schedulerqu...@gmail.com> writes:

> Re: [slurm-dev] Job stuck in CONFIGURING, node is 'mix~' 
>
> Hi Loris,
>
> At least with earlier releases, I've not found a way to act directly upon the 
> job. However, if it's possible to down the node, that should requeue (or 
> cancel) the job.
>
> Best,
> Lyn
>
> On Tue, Sep 12, 2017 at 3:40 AM, Loris Bennett <loris.benn...@fu-berlin.de> 
> wrote:
>
>  Hi,
>
>  I have a node which is powered on and to which I have sent a job. The
>  output of sinfo is
>
>  PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
>  test up 7-00:00:00 1 mix~ node001
>
>  The output of squeue is
>
>  JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
>  1795993 test 7_single loris CF 24:29 1 node001
>
>  I don't understand the node state 'mix~'. If at all, I would only
>  expect it to exist very briefly between 'idle~' and 'mix#'. The '~' is
>  certainly incorrect, as the node is not in a power-saving state, which
>  in our case is powered-off.
>
>  This problem may have existed in 16.05.10-2, but currently we are using
>  17.02.7. All other nodes in the cluster apart from one are functioning
>  normally.
>
>  Does anyone have any idea what we might be doing wrong?
>
>  Cheers,
>
>  Loris
>
>  --
>  Dr. Loris Bennett (Mr.)
>  ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de
>
>

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email loris.benn...@fu-berlin.de

[slurm-dev] Re: Job stuck in CONFIGURING, node is 'mix~'

Reply via email to