[slurm-users] Slurm User Group Meeting 2018 Agenda is online

2018-09-10 Thread Tim Wickberg
Just a quick note to mention that the SLUG'18 agenda has been posted online: https://slurm.schedmd.com/slurm_ug_agenda.html

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Brian Haymore
I re-read the docs and I was wrong on the default behavior. The default is "no" which just means don't oversubcribe the individual resources where I thought it was default to 'exclusive'. So I think I've been taking us down a dead end in terms of what I thought might help. :\ I have a

Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Chris Samuel
On Tuesday, 11 September 2018 2:05:51 AM AEST Mike Cammilleri wrote: > Just an update: the cgroup.conf file could not be parsed when I added > ConstrainKmemSpace=no. I guess this option is not compatible with our > kernel/slurm versions on Ubuntu? Not sure. I think that'll just be your version

Re: [slurm-users] Slurm on POWER9

2018-09-10 Thread Chris Samuel
Hi Keith, On Tuesday, 11 September 2018 7:46:14 AM AEST Keith Ball wrote: > 1.) Slurm seems to be incapable of recognizing sockets/cores/threads on > these systems. [...] > Anyone know if there is a way to get Slurm to recognize the true topology > for POWER nodes? IIIRC Slurm uses hwloc for

[slurm-users] Slurm on POWER9

2018-09-10 Thread Keith Ball
Hi All, We have installed slurm 17.11.8 on IBM AC922 nodes (POWER9) that have 4 GPUs each, and are running RHEL 7.5-ALT. Physically, these are 2-socket nodes, with each socket having 20 cores. Depending on SMT setting (SMT1, SMT2, SMT4) there can be 40, 80, or 160 "processors/CPUs" virtually.

Re: [slurm-users] Any information about the Slurm User Group Meeting 2018?

2018-09-10 Thread Ole Holm Nielsen
Hi Jacob, Thanks for the info. Is someone going to compile a travel and hotel information sheet? CIEMAT seems to have an agreement with some hotels. All hotels seem to be located 2-3 km from CIEMAT, so perhaps there's a local bus line to take into consideration when booking a hotel?

Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Mike Cammilleri
Just an update: the cgroup.conf file could not be parsed when I added ConstrainKmemSpace=no. I guess this option is not compatible with our kernel/slurm versions on Ubuntu? Not sure. For now we took the lazy way out and rebooted nodes. Will try the kernel options or a full slurm update as time

Re: [slurm-users] Any information about the Slurm User Group Meeting 2018?

2018-09-10 Thread Jacob Jenson
Ole, You can find hotels close to CIEMAT here https://drive.google.com/open?id=1eEKgnlBXeYNO426QS7nPuDS4nm8aUpnH=sharing Jacob On Mon, Sep 10, 2018 at 1:23 AM, Ole Holm Nielsen < ole.h.niel...@fysik.dtu.dk> wrote: > Regarding the Slurm User Group Meeting 2018 coming up in Madrid, Spain in >

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Brian Haymore
I believe the default value of this would prevent jobs from sharing a node. You may want to look at this and change it from the default. -- Brian D. Haymore University of Utah Center for High Performance Computing 155 South 1452 East RM 405 Salt Lake City, Ut 84112 Phone: 801-558-1150, Fax:

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Eli V
I think you probably want CR_LLN set in your SelectTypeParameters in slurm.conf. This makes it fill up a node before moving on to the next instead of "striping" the jobs across the nodes. On Mon, Sep 10, 2018 at 8:29 AM Felix Wolfheimer wrote: > > No this happens without the "Oversubscribe"

Re: [slurm-users] Elastic Compute

2018-09-10 Thread Felix Wolfheimer
No this happens without the "Oversubscribe" parameter being set. I'm using custom resources though: GresTypes=some_resource NodeName=compute-[1-100] CPUs=10 Gres=some_resource:10 State=CLOUD Submission uses: sbatch --nodes=1 --ntasks-per-node=1 --gres=some_resource:1 But I just tried it

Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Chris Samuel
On Monday, 10 September 2018 9:39:28 PM AEST Patrick Goetz wrote: > On 9/8/18 5:11 AM, John Hearns wrote: > > > Not an answer to your question - a good diagnostic for cgroups is the > > utility 'lscgroups' > > Where does one find this utility? It's in the libcgroup-tools package in RHEL/CentOS

[slurm-users] Any information about the Slurm User Group Meeting 2018?

2018-09-10 Thread Ole Holm Nielsen
Regarding the Slurm User Group Meeting 2018 coming up in Madrid, Spain in two weeks from now: Has anyone heard information about hotels and the schedule? The official page https://slurm.schedmd.com/slurm_ug_agenda.html was last updated on May 30... /Ole

Re: [slurm-users] can't create memory group (cgroup)

2018-09-10 Thread Janne Blomqvist
On 2018-09-07 18:53, Mike Cammilleri wrote: Hi everyone, I'm getting this error lately for everyone's jobs, which results in memory not being constrained via the cgroups plugin. slurmstepd: error: task/cgroup: unable to add task[pid=21681] to memory cg '(null)' slurmstepd: error: