To be a bit more clear, here's what I'm seeing on one for nodes using 'top'

top - 15:07:08 up  4:24,  2 users,  load average: 41.77, 44.10, 47.44
Tasks: 575 total,   8 running, 567 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.2 us,  9.7 sy,  0.0 ni, 80.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  13191924+total, 19036872 used, 11288237+free,    55692 buffers
KiB Swap:  7998460 total,        0 used,  7998460 free.  3075184 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
                                                                         nTH
18614 berg      20   0 2918144 833304   7368 R 149.6  0.6 118:45.23 R           
                                                                          48
18669 berg      20   0 2924556 844084   7368 R 149.6  0.6 119:34.34 R           
                                                                          48
18485 berg      20   0 2954316 869548   7364 R 143.5  0.7 104:00.66 R           
                                                                          48
18497 berg      20   0 2926064 845620   7368 R 139.7  0.6 119:53.11 R           
                                                                          48
18417 berg      20   0 2955552 877024   7368 R 138.0  0.7 107:39.07 R           
                                                                          48
18401 berg      20   0 2952152 870508   7368 R 132.5  0.7 107:07.17 R           
                                                                          48
18860 xinyus    20   0 10.042g 8.121g   4356 R 100.1  6.5  31:28.82 R           
                                                                          48

The load average is actually under control it seems since it's less than 48.0. 
However, why does each process report using 48 threads each? Maybe I'm seeing 
something wrong that isn't wrong but it does not seem desirable to have this 
many threads assigned - yet how can the cpu load still be under 48? Again this 
is a 24 core, 2 threads per core server.

When looking at the users' submit script, he did not specify anything at all, 
so it appears that slurm is uing mostly default allocations? He is using job 
arrays, however I've witnessed the high thread counts on single submissions 
without arrays. Below is his submit script.

#!/bin/bash
#SBATCH --mail-type=ALL
#SBATCH --mail-user=***@****.***
#SBATCH --array=1-6

R CMD BATCH readData.R output.$SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_ID

Thanks,
Mike

-----Original Message-----
From: Mike Cammilleri 
Sent: Monday, April 24, 2017 3:04 PM
To: slurm-dev <slurm-dev@schedmd.com>
Subject: RE: [slurm-dev] Re: CPU config question

Thanks for your help on this. I've enabled cgroups plugin with these same 
settings

CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/cgroup"
CgroupMountpoint=/sys/fs/cgroup
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes

And put cgroup.conf in /etc for our intalls.

I can see in the slurm logging that it's reading in cgroup.conf. I've loaded 
the new slurm.conf and restarted all slurmd processes and ran scontrol 
reconfigure on the submit node.

Memory seems to not be swapping anymore, however, I'm still having way too many 
threads get scheduled. I've tried many combinations of --cpus-per-task, 
--ntasks, cpu_bind=threads, whatever - and nothing seems to prevent any process 
from each having 48 threads according to 'top'.

The most interesting thing I've found is that even a single R job is reporting 
48 threads in 'top' (by pressing F in interactive mode when using top and 
selecting the nTH column to display). The only thing that seems to limit thread 
usage is setting OMP_NUM_THREADS env variable - this it will obey. But what we 
really need is a hard limit so no one user who thinks they're running a simple  
R job and requesting --ntasks 6, is actually getting 6*48 threads going at once 
and overloading the node. 48 threads would be the total number of "cpus" as the 
machine sees it logically. It's a 24 core machine with 2 threads on each core.

Any ideas? Could this be a non slurm issue and something specific to our 
servers (running ubuntu 14.04 LTS)? I don't want to resort to having to turn 
off hyperthreading. 

Thanks,
mike

-----Original Message-----
From: Markus Koeberl [mailto:markus.koeb...@tugraz.at] 
Sent: Thursday, April 20, 2017 3:40 AM
To: slurm-dev <slurm-dev@schedmd.com>
Cc: Mike Cammilleri <mi...@stat.wisc.edu>
Subject: [slurm-dev] Re: CPU config question


On Wednesday 19 April 2017 17:51:03 Mike Cammilleri wrote:
> 
> Hi Slurm community,
> 
> I have hopefully an easy question regarding cpu/partition configuration in 
> slurm.conf.
> 
> BACKGROUND:
> 
> We are running slurm 16.05.6 built on Ubuntu 14.04 LTS (because 14.04 works 
> with our current bcfg2 xml configuration management servers).
> Each node has two, 12 core Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 
> When you run 'cat /proc/cpuinfo' it returns 48 processors because each cores 
> consists of two threads.
> 
> I want to make sure that we are defining our cpu and available cores to slurm 
> appropriately. What slurm considers a cpu, and what a process considers a 
> thread - all can get mixed up with the semantics. 
> 
> 
> PROBLEM: 
> 
> Most users run R. R is single threaded so when someone submits a job it will 
> take 1 thread and leave the other thread on the core empty. So although a 
> user thinks there are 48 cores available, in actuality they only have the 24 
> physical available to them. If however they are running an app that can use 
> the multiple threads (Julia?) then things are different. We've been getting 
> by up to this point until a user tried to run a numpy array in his python3.5 
> app which has resulted in all kinds of cpu overload and memory swap. He's 
> using job arrays of size 32, running one array in each job, and on one node 
> for example 12 of his python apps are running but all 48 cpus are utilized. 
> Load average is 300.0+. Sometimes memory is swapping and sometimes not.

using cgroups will help to ensure that jobs cannot use more resources than 
asked for. see:
https://slurm.schedmd.com/cgroups.html

I have:
$ cat /etc/slurm-llnl/slurm.conf | grep -i cgroup 
ProctrackType=proctrack/cgroup TaskPlugin=task/cgroup 
JobAcctGatherType=jobacct_gather/cgroup

$ cat /etc/slurm-llnl/cgroup.conf
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm-llnl/cgroup"
CgroupMountpoint=/sys/fs/cgroup
ConstrainCores=yes
ConstrainDevices=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes




regards
Markus Köberl
--
Markus Koeberl
Graz University of Technology
Signal Processing and Speech Communication Laboratory
E-mail: markus.koeb...@tugraz.at

Reply via email to