Just a quick note to mention that the SLUG'18 agenda has been posted online:
https://slurm.schedmd.com/slurm_ug_agenda.html
I re-read the docs and I was wrong on the default behavior. The default is
"no" which just means don't oversubcribe the individual resources where I
thought it was default to 'exclusive'. So I think I've been taking us down a
dead end in terms of what I thought might help. :\
I have a
On Tuesday, 11 September 2018 2:05:51 AM AEST Mike Cammilleri wrote:
> Just an update: the cgroup.conf file could not be parsed when I added
> ConstrainKmemSpace=no. I guess this option is not compatible with our
> kernel/slurm versions on Ubuntu? Not sure.
I think that'll just be your version
Hi Keith,
On Tuesday, 11 September 2018 7:46:14 AM AEST Keith Ball wrote:
> 1.) Slurm seems to be incapable of recognizing sockets/cores/threads on
> these systems.
[...]
> Anyone know if there is a way to get Slurm to recognize the true topology
> for POWER nodes?
IIIRC Slurm uses hwloc for
Hi All,
We have installed slurm 17.11.8 on IBM AC922 nodes (POWER9) that have 4
GPUs each, and are running RHEL 7.5-ALT. Physically, these are 2-socket
nodes, with each socket having 20 cores. Depending on SMT setting (SMT1,
SMT2, SMT4) there can be 40, 80, or 160 "processors/CPUs" virtually.
Hi Jacob,
Thanks for the info. Is someone going to compile a travel and hotel
information sheet?
CIEMAT seems to have an agreement with some hotels. All hotels seem to
be located 2-3 km from CIEMAT, so perhaps there's a local bus line to
take into consideration when booking a hotel?
Just an update: the cgroup.conf file could not be parsed when I added
ConstrainKmemSpace=no. I guess this option is not compatible with our
kernel/slurm versions on Ubuntu? Not sure. For now we took the lazy way out and
rebooted nodes. Will try the kernel options or a full slurm update as time
Ole,
You can find hotels close to CIEMAT here
https://drive.google.com/open?id=1eEKgnlBXeYNO426QS7nPuDS4nm8aUpnH=sharing
Jacob
On Mon, Sep 10, 2018 at 1:23 AM, Ole Holm Nielsen <
ole.h.niel...@fysik.dtu.dk> wrote:
> Regarding the Slurm User Group Meeting 2018 coming up in Madrid, Spain in
>
I believe the default value of this would prevent jobs from sharing a node.
You may want to look at this and change it from the default.
--
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112
Phone: 801-558-1150, Fax:
I think you probably want CR_LLN set in your SelectTypeParameters in
slurm.conf. This makes it fill up a node before moving on to the next
instead of "striping" the jobs across the nodes.
On Mon, Sep 10, 2018 at 8:29 AM Felix Wolfheimer
wrote:
>
> No this happens without the "Oversubscribe"
No this happens without the "Oversubscribe" parameter being set. I'm using
custom resources though:
GresTypes=some_resource
NodeName=compute-[1-100] CPUs=10 Gres=some_resource:10 State=CLOUD
Submission uses:
sbatch --nodes=1 --ntasks-per-node=1 --gres=some_resource:1
But I just tried it
On Monday, 10 September 2018 9:39:28 PM AEST Patrick Goetz wrote:
> On 9/8/18 5:11 AM, John Hearns wrote:
>
> > Not an answer to your question - a good diagnostic for cgroups is the
> > utility 'lscgroups'
>
> Where does one find this utility?
It's in the libcgroup-tools package in RHEL/CentOS
Regarding the Slurm User Group Meeting 2018 coming up in Madrid, Spain
in two weeks from now: Has anyone heard information about hotels and
the schedule? The official page
https://slurm.schedmd.com/slurm_ug_agenda.html was last updated on May 30...
/Ole
On 2018-09-07 18:53, Mike Cammilleri wrote:
Hi everyone,
I'm getting this error lately for everyone's jobs, which results in memory not
being constrained via the cgroups plugin.
slurmstepd: error: task/cgroup: unable to add task[pid=21681] to memory cg
'(null)'
slurmstepd: error:
14 matches
Mail list logo