Hi Travis

To use SelectType select/cons_res with SelectTypeParameters CR_Core_Memory
I have found I need RealMemory to be specified for each worker node, as in:

NodeName=worker2-[1-19] Procs=48 State=UNKNOWN RealMemory=250000

This allows me to move away from Mem=1, so scontrol shows:

[pvh@master2-st ~]$ scontrol show node worker2-1
NodeName=worker2-1 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=5 CPUErr=0 CPUTot=48 CPULoad=4.98
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=worker2-1 NodeHostName=worker2-1 Version=16.05
   OS=Linux RealMemory=250000 AllocMem=250000 FreeMem=198073 Sockets=48
Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   BootTime=2017-02-06T14:11:53 SlurmdStartTime=2017-02-07T11:41:45
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Then, to avoid jobs having a whole node's RAM allocated to them by default,
I set DefMemPerNode, as in:

DefMemPerNode=5000

Not sure if you already have settings similar to this, but these seem to be
the key settings in my experience.

Peter

On Mon, 13 Feb 2017 at 00:24 Travis DePrato <trav...@umich.edu> wrote:

> I can't specify memory at all (if I try to give the --mem=2 flag, I get an
> error message that the requested node configuration isn't available), and
> the default is 1000MB (well below the amount of RAM on the machines).
>
> On Sun, Feb 12, 2017 at 5:06 PM Carlos Fenoy <mini...@gmail.com> wrote:
>
> Are you specifying a memory limit for your jobs? You haven't set a default
> limit per cpu and slurm will allocate all the memory of a node if nothing
> else is specified.
>
> Regards,
> Carlos Fenoy
>
> On Sun, 12 Feb 2017, 22:54 Travis DePrato, <trav...@umich.edu> wrote:
>
> Yep! Doing everything I can think of, running scontrol reconfigure,
> restarting all the relevant daemons, can't seem to get it to work.
>
> On Sun, Feb 12, 2017 at 4:38 PM Lachlan Musicman <data...@gmail.com>
> wrote:
>
> On 12 February 2017 at 16:06, Travis DePrato <trav...@umich.edu> wrote:
>
> I've tried multiple variations of the SelectTypeParameters option (before
> sending this mail) to no success.
>
> Currently it's http://pastebin.com/ATcsvvtQ with
> SelectTypeParameters=CR_CPU_Memory
>
> I'm running 10 jobs, each single threaded/processed/etc., just sitting on
> "sleep 1000", but I can never get more than 8 to run at a time, and I still
> can't memory other than 1.
>
>
>
> I always ask the stupid questions: you are changing the conf, distributing
> that change to all nodes,  restarting slurmctld then running scontrol
> reconfigure?
>
>
> cheers
> L.
>
>
> ------
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>
>
>
>
>
> On Sat, Feb 11, 2017 at 3:26 AM Lachlan Musicman <data...@gmail.com>
> wrote:
>
> 1. As EV noted, to get Memory as a consumable resource, you will need to
> add it to the line that says CR_CPU - change to CR_CPU_Memory
> https://slurm.schedmd.com/slurm.conf.html
>
> 2. That's because of the CR_CPU combined with cons_res. Change to CR_CORE
> for per core or CR_SOCKET for per socket. For definitions of each, there's
> a hardware page:
>
> https://slurm.schedmd.com/cons_res.html
>
> but for the cpu/core/socket definition, I found the image at the top of
> this page very helpful
>
> https://slurm.schedmd.com/mc_support.html
>
> L.
>
> ------
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>
> On 11 February 2017 at 07:31, E V <eliven...@gmail.com> wrote:
>
>
> man slurm.conf and search for cons_res, you need to make a change from
> the defaults. Don't remember the details ATM, but that should get you
> started.
>
> On Fri, Feb 10, 2017 at 2:42 PM, Travis DePrato <trav...@umich.edu> wrote:
> > For reference, slurm.conf: http://pastebin.com/XT6TvQhh
> >
> > I've been tasked with setting up a small cluster for a research group
> where
> > I work, despite knowing relatively little about HPC or clusters in
> general.
> > I've installed slurm on the eight compute nodes and the login node, but,
> I'm
> > having two issues currently:
> >
> > 1. I cannot specify a memory requirement other than --mem=1
> > Sample submission output with --mem=2: http://pastebin.com/5PY9N6n4
> >
> > 2. I cannot get nodes to execute more than one job at a time. The 9th
> job is
> > always queued with reason Resources. I think this is related to the lines
> >
> > scontrol: Consumable Resources (CR) Node Selection plugin loaded with
> > argument 17
> > scontrol: Serial Job Resource Selection plugin loaded with argument 17
> > scontrol: Linear node selection plugin loaded with argument 17
> >
> > because it seems like slurm is only allocating whole nodes at a time.
> >
> > Sorry if this is basic setup, but I've tried googling to no end.
> > --
> > Travis DePrato
> > Computer Science & Engineering
> > Math and Music Minors
> > Student at University of Michigan
> > Computer Consultant at EECS DCO
>
>
> --
> Travis DePrato
> Computer Science & Engineering
> Math and Music Minors
> Student at University of Michigan
> Computer Consultant at EECS DCO
>
> --
> Travis DePrato
> Computer Science & Engineering
> Math and Music Minors
> Student at University of Michigan
> Computer Consultant at EECS DCO
>
> --
> Travis DePrato
> Computer Science & Engineering
> Math and Music Minors
> Student at University of Michigan
> Computer Consultant at EECS DCO
>

Reply via email to