Hi Travis To use SelectType select/cons_res with SelectTypeParameters CR_Core_Memory I have found I need RealMemory to be specified for each worker node, as in:
NodeName=worker2-[1-19] Procs=48 State=UNKNOWN RealMemory=250000 This allows me to move away from Mem=1, so scontrol shows: [pvh@master2-st ~]$ scontrol show node worker2-1 NodeName=worker2-1 Arch=x86_64 CoresPerSocket=1 CPUAlloc=5 CPUErr=0 CPUTot=48 CPULoad=4.98 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=worker2-1 NodeHostName=worker2-1 Version=16.05 OS=Linux RealMemory=250000 AllocMem=250000 FreeMem=198073 Sockets=48 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A BootTime=2017-02-06T14:11:53 SlurmdStartTime=2017-02-07T11:41:45 CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s Then, to avoid jobs having a whole node's RAM allocated to them by default, I set DefMemPerNode, as in: DefMemPerNode=5000 Not sure if you already have settings similar to this, but these seem to be the key settings in my experience. Peter On Mon, 13 Feb 2017 at 00:24 Travis DePrato <trav...@umich.edu> wrote: > I can't specify memory at all (if I try to give the --mem=2 flag, I get an > error message that the requested node configuration isn't available), and > the default is 1000MB (well below the amount of RAM on the machines). > > On Sun, Feb 12, 2017 at 5:06 PM Carlos Fenoy <mini...@gmail.com> wrote: > > Are you specifying a memory limit for your jobs? You haven't set a default > limit per cpu and slurm will allocate all the memory of a node if nothing > else is specified. > > Regards, > Carlos Fenoy > > On Sun, 12 Feb 2017, 22:54 Travis DePrato, <trav...@umich.edu> wrote: > > Yep! Doing everything I can think of, running scontrol reconfigure, > restarting all the relevant daemons, can't seem to get it to work. > > On Sun, Feb 12, 2017 at 4:38 PM Lachlan Musicman <data...@gmail.com> > wrote: > > On 12 February 2017 at 16:06, Travis DePrato <trav...@umich.edu> wrote: > > I've tried multiple variations of the SelectTypeParameters option (before > sending this mail) to no success. > > Currently it's http://pastebin.com/ATcsvvtQ with > SelectTypeParameters=CR_CPU_Memory > > I'm running 10 jobs, each single threaded/processed/etc., just sitting on > "sleep 1000", but I can never get more than 8 to run at a time, and I still > can't memory other than 1. > > > > I always ask the stupid questions: you are changing the conf, distributing > that change to all nodes, restarting slurmctld then running scontrol > reconfigure? > > > cheers > L. > > > ------ > The most dangerous phrase in the language is, "We've always done it this > way." > > - Grace Hopper > > > > > > On Sat, Feb 11, 2017 at 3:26 AM Lachlan Musicman <data...@gmail.com> > wrote: > > 1. As EV noted, to get Memory as a consumable resource, you will need to > add it to the line that says CR_CPU - change to CR_CPU_Memory > https://slurm.schedmd.com/slurm.conf.html > > 2. That's because of the CR_CPU combined with cons_res. Change to CR_CORE > for per core or CR_SOCKET for per socket. For definitions of each, there's > a hardware page: > > https://slurm.schedmd.com/cons_res.html > > but for the cpu/core/socket definition, I found the image at the top of > this page very helpful > > https://slurm.schedmd.com/mc_support.html > > L. > > ------ > The most dangerous phrase in the language is, "We've always done it this > way." > > - Grace Hopper > > On 11 February 2017 at 07:31, E V <eliven...@gmail.com> wrote: > > > man slurm.conf and search for cons_res, you need to make a change from > the defaults. Don't remember the details ATM, but that should get you > started. > > On Fri, Feb 10, 2017 at 2:42 PM, Travis DePrato <trav...@umich.edu> wrote: > > For reference, slurm.conf: http://pastebin.com/XT6TvQhh > > > > I've been tasked with setting up a small cluster for a research group > where > > I work, despite knowing relatively little about HPC or clusters in > general. > > I've installed slurm on the eight compute nodes and the login node, but, > I'm > > having two issues currently: > > > > 1. I cannot specify a memory requirement other than --mem=1 > > Sample submission output with --mem=2: http://pastebin.com/5PY9N6n4 > > > > 2. I cannot get nodes to execute more than one job at a time. The 9th > job is > > always queued with reason Resources. I think this is related to the lines > > > > scontrol: Consumable Resources (CR) Node Selection plugin loaded with > > argument 17 > > scontrol: Serial Job Resource Selection plugin loaded with argument 17 > > scontrol: Linear node selection plugin loaded with argument 17 > > > > because it seems like slurm is only allocating whole nodes at a time. > > > > Sorry if this is basic setup, but I've tried googling to no end. > > -- > > Travis DePrato > > Computer Science & Engineering > > Math and Music Minors > > Student at University of Michigan > > Computer Consultant at EECS DCO > > > -- > Travis DePrato > Computer Science & Engineering > Math and Music Minors > Student at University of Michigan > Computer Consultant at EECS DCO > > -- > Travis DePrato > Computer Science & Engineering > Math and Music Minors > Student at University of Michigan > Computer Consultant at EECS DCO > > -- > Travis DePrato > Computer Science & Engineering > Math and Music Minors > Student at University of Michigan > Computer Consultant at EECS DCO >