[slurm-dev] Re: Exclusive socket configuration help

Cyrus Proctor Wed, 22 Mar 2017 13:52:45 -0700

Merlin, thanks for the insight.

I set:
#SBATCH --mem=1G


That is all I needed to get it to share.

How can I set the default to use only the memory attached to theparticular socket the job is running on and have the default memory tobe set to that value (64GB in my case)?


I think I've done it (sort of) with:

PartitionName=normal Nodes=d0[1,2] Default=YES OverSubscribe=FORCE:2SelectTypeParameters=CR_Socket_Memory QoS=part_shared MaxCPUsPerNode=28DefMemPerCPU=4590 MaxMemPerCPU=4590 MaxTime=48:00:00 State=UP


On 03/22/2017 12:04 PM, Merlin Hartley wrote:

Hi Cyrus
I think you should specify the memory requirements in your sbatchscript - the default would be to allocate all the memory for a node -thus ‘filling’ it even with a 1 cpu job.
#SBATCH --mem 1G

Hope this helps!


Merlin
--
Merlin Hartley
Computer Officer
MRC Mitochondrial Biology Unit
Cambridge, CB2 0XY
United Kingdom
On 22 Mar 2017, at 16:20, Cyrus Proctor <cproc...@tacc.utexas.edu<mailto:cproc...@tacc.utexas.edu>> wrote:
Hi all,
Any thoughts at all on this would be most helpful. I'm not sure whereto go from here to get overcommitted nodes working properly.
Thank you,
Cyrus

On 03/17/2017 11:39 AM, Cyrus Proctor wrote:
Hello,
I currently have a small cluster for testing. Each compute nodecontains 2 sockets with 14 cores per CPU and a total of 128 GB RAM.I would like to set up Slurm such that two jobs can simultaneouslyshare one compute node, effectively giving 1 socket (with binding)and half the total memory to each job.
I've tried several iterations of settings, to no avail. It seemsthat whatever I try, I am still only allowed to run one job per node(blocked by "resources" reason). I am running Slurm 17.02.1-2, and Iam attaching my slurm.conf as well as cgroup.conf files. Systeminformation includes:
# uname -r
3.10.0-514.10.2.el7.x86_64
# cat /etc/centos-release
CentOS Linux release 7.3.1611 (Core)
I am also attaching logs for slurmd (slurmd.d01.log) and slurmctld(slurmctld.log) as I submit three jobs (batch.slurm) in rapidsuccession. With two compute nodes available, I would hope that allthree start together. Instead, two begin and one waits until a nodebecomes idle to start.
There is likely extra "crud" in the config files simply from priorfailed attempts. I'm happy to take out / reconfigure as necessarybut not sure what exactly is the right combination of settings toget this to work. I'm hoping that's where you all can help.
Thanks,
Cyrus

[slurm-dev] Re: Exclusive socket configuration help

Reply via email to