[slurm-dev] Re: Overview

Felip Moll Fri, 26 Apr 2013 00:39:17 -0700

This is my scontrol show node:

NodeName=pez015 Arch=x86_64 CoresPerSocket=6
   CPUAlloc=12 CPUErr=0 CPUTot=12 Features=(null)
   Gres=(null)
   NodeAddr=pez015 NodeHostName=pez015
   OS=Linux RealMemory=48128 Sockets=2
   State=ALLOCATED ThreadsPerCore=1 TmpDisk=61440 Weight=1
   BootTime=2013-03-01T16:40:23 SlurmdStartTime=2013-03-01T16:41:04


Neither in sview I can find this information. Something may be missing in
my slurm.conf?

Slurm 2.4.3


2013/4/26 Danny Auble <d...@schedmd.com>

> sview should also work. Just right click on the nodes tab to display other
> columns. Or double click on the node in question.
>
>
> Moe Jette <je...@schedmd.com> wrote:
>>
>>
>> $ scontrol show node
>> NodeName=xxx Arch=i686 CoresPerSocket=1
>> CPUAlloc=2 CPUErr=0 CPUTot=2 CPULoad=1.49 Features=(null)
>> Gres=(null)
>> NodeAddr=jette-netbook NodeHostName=jette-netbook
>> OS=Linux RealMemory=990 AllocMem=100 Sockets=1 Boards=1
>> Right here                 ^^^^^^^^^^^^
>>
>> 6) Current Job Memory allocation for nodes
>>>
>>> I am currently looking for options in sstat, sinfo, scontrol.. but I can't
>>> find how to see the total reserved memory for one particular node.
>>>
>>> In sview, "nodes" tab, you can see how many cpus are used/free for each
>>> node, but not how many memory.
>>>
>>> Thks!.
>>>
>>>
>>> 2013/4/25 Mario Kadastik <mario.kadas...@cern.ch>
>>>
>>>
>>> Hi,
>>>>
>>>> I'm trying to get an overview of the state of the cluster. What I'd really
>>>> like to know is for example:
>>>>
>>>> 1) compute nodes online
>>>> 2) compute cores online
>>>> 3) compute cores allocated
>>>> 4) distribution of job sizes currently running (and queued possibly)
>>>> 5) list of nodes that are down/draining and reason
>>>>
>>>> out of those #1 and #5 can be gotten from sinfo command with sinfo -Nle -p
>>>> main, which shows nodes and their states with reasons.
>>>>
>>>> However I cannot find right now quickly how to find out how many cores in
>>>> total are online (in theory it's nodes up * cpu count / node summed for
>>>> each node type) and even more crucial is how many cores are actually used
>>>> and by what size jobs. Today I was really tearing my hair out as 99% of the
>>>> time we use single core jobs and on my ca 4300 cores I only saw ca
>>>> 1800
>>>> jobs with 6000 in queue. As it came out a user had submitted 5 jobs with
>>>> subtasks. Four had 100 subtasks and one had 2000 nicely accounting for the
>>>> missing jobs. However I would really appreciate some summary view of the
>>>> cluster. Is it already available in sinfo, sstat, scontrol commands? If
>>>> not, does anyone have a good script that gathers the info together
>>>> efficiently and lists it.
>>>>
>>>> It'd have to be text only as all nodes are headless and I'd prefer to get
>>>> the overview in a nice summary in shell.
>>>>
>>>> Thanks,
>>>>
>>>> Mario Kadastik, PhD
>>>> Researcher
>>>>
>>>> ---
>>>> "Physics is like sex, sure it may have practical reasons, but that's not
>>>> why we do it"
>>>> -- Richard P. Feynman
>>>
>>>
>>>
>>>
>>>

[slurm-dev] Re: Overview

Reply via email to