Now where I know to use sinfo and not scontrol I found some more flags
which do what I want:
$ sinfo --long --list-reasons
REASON USER TIMESTAMP STATE NODELIST
Not responding root(0) 2013-07-17T10:23:19 down* gpu-1-13
/dev/nvidia4 missing hocks(503) 2013-07-17T12:47:09 down* gpu-1-16
Could not unpack gre root(0) 2013-07-15T11:44:33 down* gpu-2-7
/dev/nvidia4 and fol hocks(503) 2013-07-17T12:49:20 down* gpu-2-10
The sinfo -dR only shows the nodes with "down*", not the "down" nodes.
Thanks for your fast help!!!
Eva
On Wed, 17 Jul 2013, Michael Gutteridge wrote:
> Hi
>
> Does sinfo give you what you need? -d lists only down nodes, the -R lists
> the reason:
>
> $ sinfo -dR
> REASON USER TIMESTAMP NODELIST
> broken root 2013-05-06T01:09:37 puck2
> broken root 2013-05-06T01:09:31 puck3
>
> HTH
>
> Michael
>
>
>
> On Wed, Jul 17, 2013 at 2:14 PM, Eva Hocks <ho...@sdsc.edu> wrote:
>
> >
> >
> >
> >
> >
> > How can I list information on only "down" nodes which were offlined due
> >
> > to a Reason? I am looking for a command similar to torque's
> >
> > pbsnodes -ln, a short list with not all the node details. On a system
> >
> > with 1000 nodes that list gets rather long.
> >
> >
> >
> > pbsnodes -ln
> >
> > com-2-74 offline /oasis not responding
> >
> > com-2-75 offline check IB performance
> >
> > com-3-11 offline vsmp version test
> >
> > com-4-42 offline,job-exclusive /scratch I/O error
> >
> >
> >
> >
> >
> > unfortunately the pbsnodes in slurm only support the "-a" flag. I am
> >
> > running slurm 2.6.0
> >
> >
> >
> >
> >
> > Thanks
> >
> > Eva
>
>
>
>
>