Hello, From the man page:
MaxMemPerCPU > ... > NOTE: If a job specifies a memory per CPU limit that exceeds > this system limit, that job’s count of CPUs per task will automatically be > increased. This may result in the job failing due to CPU count limits. Here is hat I get with version *15.08* # srun --version slurm 15.08.13 # srun --mem 600 sleep 5 && scontrol show job JobId=15 JobName=sleep UserId=root(0) GroupId=root(0) Priority=4294901757 Nice=0 Account=(null) QOS=(null) JobState=COMPLETED Reason=None Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A SubmitTime=2017-02-14T14:39:54 EligibleTime=2017-02-14T14:39:54 StartTime=2017-02-14T14:39:54 EndTime=2017-02-14T14:39:59 PreemptTime=None SuspendTime=None SecsPreSuspend=0 Partition=short AllocNode:Sid=dhcpvm4-174:5130 ReqNodeList=(null) ExcNodeList=(null) NodeList=dhcpvm4-191 BatchHost=dhcpvm4-191 NumNodes=1 *NumCPUs=3* CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=3,*mem=600*,node=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=3 MinMemoryNode=600M MinTmpDiskNode=0 Features=(null) Gres=(null) Reservation=(null) Shared=OK Contiguous=0 Licenses=(null) Network=(null) Command=sleep WorkDir=/root Power= SICP=0 Julien 2017-02-14 16:14 GMT+01:00 E V <eliven...@gmail.com>: > > You sure it worked as you expected? I always think of CPUs & RAM is > independent things that need to be manually requested independently. > > On Tue, Feb 14, 2017 at 10:07 AM, Julien Collas <jul.col...@gmail.com> > wrote: > > Hello, > > > > I made some tests on a simple environment and it seems that this > > functionality is working fine until 15.08.13 (included). > > With versions 16.05.6, 16.05.9, and 17.02.0-0rc1 I'm not able to see > what I > > would expect to see. > > > > # scontrol show part > > PartitionName=short > > AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL > > AllocNodes=ALL Default=YES QoS=N/A > > DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > > Hidden=NO > > MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO > > MaxCPUsPerNode=UNLIMITED > > Nodes=dhcpvm4-191 > > PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO > > OverSubscribe=NO > > OverTimeLimit=NONE PreemptMode=OFF > > State=UP TotalCPUs=8 TotalNodes=1 SelectTypeParameters=NONE > > DefMemPerCPU=200 MaxMemPerCPU=200 > > > > # srun --mem 600 sleep 5 && scontrol show job > > JobId=25 JobName=sleep > > UserId=root(0) GroupId=root(0) MCS_label=N/A > > Priority=4294901754 Nice=0 Account=(null) QOS=(null) > > JobState=COMPLETED Reason=None Dependency=(null) > > Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 > > RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A > > SubmitTime=2017-02-14T15:38:41 EligibleTime=2017-02-14T15:38:41 > > StartTime=2017-02-14T15:38:41 EndTime=2017-02-14T15:38:46 Deadline=N/A > > PreemptTime=None SuspendTime=None SecsPreSuspend=0 > > Partition=short AllocNode:Sid=dhcpvm4-174:5130 > > ReqNodeList=(null) ExcNodeList=(null) > > NodeList=dhcpvm4-191 > > BatchHost=dhcpvm4-191 > > NumNodes=1 NumCPUs=1 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > > TRES=cpu=1,mem=600M,node=1 > > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > > MinCPUsNode=1 MinMemoryNode=600M MinTmpDiskNode=0 > > Features=(null) DelayBoot=00:00:00 > > Gres=(null) Reservation=(null) > > OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) > > Command=sleep > > WorkDir=/root > > Power= > > > > It doesn't help me a lot with my problem but ... > > > > Best regards, > > > > Julien > > > > 2017-02-02 8:53 GMT+01:00 Julien Collas <jul.col...@gmail.com>: > >> > >> Hi, > >> > >> It seems that my MaxMemPerCpu is not working as I would have expected > >> (increase cpu if mem or mem-per-cpu exceed that limit) > >> > >> Here is my partition definition > >> > >> $ scontrol show part short > >> PartitionName=short > >> AllowGroups=ALL DenyAccounts=data AllowQos=ALL > >> AllocNodes=ALL Default=YES QoS=N/A > >> DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > >> Hidden=NO > >> MaxNodes=UNLIMITED MaxTime=00:30:00 MinNodes=1 LLN=NO > >> MaxCPUsPerNode=UNLIMITED > >> Nodes=srv0029[73-80,87-95,98-99] > >> PriorityJobFactor=1000 PriorityTier=1000 RootOnly=NO ReqResv=NO > >> OverSubscribe=NO PreemptMode=OFF > >> State=UP TotalCPUs=560 TotalNodes=28 SelectTypeParameters=NONE > >> DefMemPerCPU=19000 MaxMemPerCPU=19000 > >> > >> > >> $ srun --partition=short --mem=40000 sleep 10 > >> $ srun --partition=short --mem-per-cpu=40000 sleep 10 > >> $ sacct > >> JobID User Account JobName Priority NTasks > >> AllocCPUS ReqCPUS MaxRSS MaxVMSize ReqMem State > >> ------------ --------- ---------- ---------- ---------- -------- > >> ---------- -------- ---------- ---------- ---------- ---------- > >> 19522383 jcollas admin sleep 994 1 > >> 1 1 92K 203980K 40000Mn COMPLETED > >> 19522384 jcollas admin sleep 994 1 > >> 1 1 92K 203980K 40000Mc COMPLETED > >> > >> For theses 2 jobs, I would have expected AllocCPUS to 3. > >> > >> $ scontrol show conf > >> ... > >> DefMemPerNode = UNLIMITED > >> MaxMemPerNode = UNLIMITED > >> MemLimitEnforce = Yes > >> SelectTypeParameters = CR_CPU_MEMORY > >> ... > >> AccountingStorageBackupHost = (null) > >> AccountingStorageEnforce = associations,limits > >> AccountingStorageHost = stor089 > >> AccountingStorageLoc = N/A > >> AccountingStoragePort = 6819 > >> AccountingStorageTRES = cpu,mem,energy,node > >> AccountingStorageType = accounting_storage/slurmdbd > >> AccountingStorageUser = N/A > >> AccountingStoreJobComment = Yes > >> AcctGatherEnergyType = acct_gather_energy/none > >> AcctGatherFilesystemType = acct_gather_filesystem/none > >> AcctGatherInfinibandType = acct_gather_infiniband/none > >> AcctGatherNodeFreq = 0 sec > >> AcctGatherProfileType = acct_gather_profile/none > >> JobAcctGatherFrequency = 10 > >> JobAcctGatherType = jobacct_gather/cgroup > >> JobAcctGatherParams = (null) > >> ... > >> > >> We are currently running with version 16.05.6 > >> > >> Is there something I am missing ? > >> > >> > >> Regards, > >> > >> Julien > > > > >