Now where I know to use sinfo and not scontrol I found some more flags
which do what I want:


$ sinfo --long --list-reasons
REASON               USER         TIMESTAMP           STATE  NODELIST
Not responding       root(0)      2013-07-17T10:23:19 down*  gpu-1-13
/dev/nvidia4 missing hocks(503)   2013-07-17T12:47:09 down*  gpu-1-16
Could not unpack gre root(0)      2013-07-15T11:44:33 down*  gpu-2-7
/dev/nvidia4 and fol hocks(503)   2013-07-17T12:49:20 down*  gpu-2-10



The sinfo -dR only shows the nodes with "down*", not the "down" nodes.


Thanks for your fast help!!!
Eva


On Wed, 17 Jul 2013, Michael Gutteridge wrote:

> Hi
>
> Does sinfo give you what you need?  -d lists only down nodes, the -R lists
> the reason:
>
> $ sinfo -dR
> REASON               USER      TIMESTAMP           NODELIST
> broken               root      2013-05-06T01:09:37 puck2
> broken               root      2013-05-06T01:09:31 puck3
>
> HTH
>
> Michael
>
>
>
> On Wed, Jul 17, 2013 at 2:14 PM, Eva Hocks <ho...@sdsc.edu> wrote:
>
> >
> >
> >
> >
> >
> > How can I list information on only "down" nodes which were offlined due
> >
> > to a Reason? I am looking for a command similar to torque's
> >
> > pbsnodes -ln, a short list with not all the node details. On a system
> >
> > with 1000 nodes that list gets rather long.
> >
> >
> >
> > pbsnodes -ln
> >
> > com-2-74             offline                    /oasis not responding
> >
> > com-2-75             offline                    check IB performance
> >
> > com-3-11             offline                    vsmp version test
> >
> > com-4-42             offline,job-exclusive      /scratch I/O error
> >
> >
> >
> >
> >
> > unfortunately the pbsnodes in slurm only support the "-a" flag. I am
> >
> > running slurm 2.6.0
> >
> >
> >
> >
> >
> > Thanks
> >
> > Eva
>
>
>
>
>

Reply via email to