This is my scontrol show node: NodeName=pez015 Arch=x86_64 CoresPerSocket=6 CPUAlloc=12 CPUErr=0 CPUTot=12 Features=(null) Gres=(null) NodeAddr=pez015 NodeHostName=pez015 OS=Linux RealMemory=48128 Sockets=2 State=ALLOCATED ThreadsPerCore=1 TmpDisk=61440 Weight=1 BootTime=2013-03-01T16:40:23 SlurmdStartTime=2013-03-01T16:41:04
Neither in sview I can find this information. Something may be missing in my slurm.conf? Slurm 2.4.3 2013/4/26 Danny Auble <d...@schedmd.com> > sview should also work. Just right click on the nodes tab to display other > columns. Or double click on the node in question. > > > Moe Jette <je...@schedmd.com> wrote: >> >> >> $ scontrol show node >> NodeName=xxx Arch=i686 CoresPerSocket=1 >> CPUAlloc=2 CPUErr=0 CPUTot=2 CPULoad=1.49 Features=(null) >> Gres=(null) >> NodeAddr=jette-netbook NodeHostName=jette-netbook >> OS=Linux RealMemory=990 AllocMem=100 Sockets=1 Boards=1 >> Right here ^^^^^^^^^^^^ >> >> 6) Current Job Memory allocation for nodes >>> >>> I am currently looking for options in sstat, sinfo, scontrol.. but I can't >>> find how to see the total reserved memory for one particular node. >>> >>> In sview, "nodes" tab, you can see how many cpus are used/free for each >>> node, but not how many memory. >>> >>> Thks!. >>> >>> >>> 2013/4/25 Mario Kadastik <mario.kadas...@cern.ch> >>> >>> >>> Hi, >>>> >>>> I'm trying to get an overview of the state of the cluster. What I'd really >>>> like to know is for example: >>>> >>>> 1) compute nodes online >>>> 2) compute cores online >>>> 3) compute cores allocated >>>> 4) distribution of job sizes currently running (and queued possibly) >>>> 5) list of nodes that are down/draining and reason >>>> >>>> out of those #1 and #5 can be gotten from sinfo command with sinfo -Nle -p >>>> main, which shows nodes and their states with reasons. >>>> >>>> However I cannot find right now quickly how to find out how many cores in >>>> total are online (in theory it's nodes up * cpu count / node summed for >>>> each node type) and even more crucial is how many cores are actually used >>>> and by what size jobs. Today I was really tearing my hair out as 99% of the >>>> time we use single core jobs and on my ca 4300 cores I only saw ca >>>> 1800 >>>> jobs with 6000 in queue. As it came out a user had submitted 5 jobs with >>>> subtasks. Four had 100 subtasks and one had 2000 nicely accounting for the >>>> missing jobs. However I would really appreciate some summary view of the >>>> cluster. Is it already available in sinfo, sstat, scontrol commands? If >>>> not, does anyone have a good script that gathers the info together >>>> efficiently and lists it. >>>> >>>> It'd have to be text only as all nodes are headless and I'd prefer to get >>>> the overview in a nice summary in shell. >>>> >>>> Thanks, >>>> >>>> Mario Kadastik, PhD >>>> Researcher >>>> >>>> --- >>>> "Physics is like sex, sure it may have practical reasons, but that's not >>>> why we do it" >>>> -- Richard P. Feynman >>> >>> >>> >>> >>>