Indeed, this block is not present in the man page of 14.03 but it's working like if it were the case :-)
In version 16.05.6, it's in the man page but it's not working :-( Julien 2017-02-14 16:48 GMT+01:00 E V <eliven...@gmail.com>: > > Interesting, that bit isn't in the man page for 14.03. I'll be > deploying 15.08 soon, so that's nice to know. > > On Tue, Feb 14, 2017 at 10:28 AM, Julien Collas <jul.col...@gmail.com> > wrote: > > Hello, > > > > From the man page: > > > >> MaxMemPerCPU > >> ... > >> NOTE: If a job specifies a memory per CPU limit that > exceeds > >> this system limit, that job’s count of CPUs per task will automatically > be > >> increased. This may result in the job failing due to CPU count limits. > > > > > > Here is hat I get with version 15.08 > > > > # srun --version > > slurm 15.08.13 > > # srun --mem 600 sleep 5 && scontrol show job > > JobId=15 JobName=sleep > > UserId=root(0) GroupId=root(0) > > Priority=4294901757 Nice=0 Account=(null) QOS=(null) > > JobState=COMPLETED Reason=None Dependency=(null) > > Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 > > RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A > > SubmitTime=2017-02-14T14:39:54 EligibleTime=2017-02-14T14:39:54 > > StartTime=2017-02-14T14:39:54 EndTime=2017-02-14T14:39:59 > > PreemptTime=None SuspendTime=None SecsPreSuspend=0 > > Partition=short AllocNode:Sid=dhcpvm4-174:5130 > > ReqNodeList=(null) ExcNodeList=(null) > > NodeList=dhcpvm4-191 > > BatchHost=dhcpvm4-191 > > NumNodes=1 NumCPUs=3 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > > TRES=cpu=3,mem=600,node=1 > > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > > MinCPUsNode=3 MinMemoryNode=600M MinTmpDiskNode=0 > > Features=(null) Gres=(null) Reservation=(null) > > Shared=OK Contiguous=0 Licenses=(null) Network=(null) > > Command=sleep > > WorkDir=/root > > Power= SICP=0 > > > > > > Julien > > > > 2017-02-14 16:14 GMT+01:00 E V <eliven...@gmail.com>: > >> > >> > >> You sure it worked as you expected? I always think of CPUs & RAM is > >> independent things that need to be manually requested independently. > >> > >> On Tue, Feb 14, 2017 at 10:07 AM, Julien Collas <jul.col...@gmail.com> > >> wrote: > >> > Hello, > >> > > >> > I made some tests on a simple environment and it seems that this > >> > functionality is working fine until 15.08.13 (included). > >> > With versions 16.05.6, 16.05.9, and 17.02.0-0rc1 I'm not able to see > >> > what I > >> > would expect to see. > >> > > >> > # scontrol show part > >> > PartitionName=short > >> > AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL > >> > AllocNodes=ALL Default=YES QoS=N/A > >> > DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > >> > Hidden=NO > >> > MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO > >> > MaxCPUsPerNode=UNLIMITED > >> > Nodes=dhcpvm4-191 > >> > PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO > >> > OverSubscribe=NO > >> > OverTimeLimit=NONE PreemptMode=OFF > >> > State=UP TotalCPUs=8 TotalNodes=1 SelectTypeParameters=NONE > >> > DefMemPerCPU=200 MaxMemPerCPU=200 > >> > > >> > # srun --mem 600 sleep 5 && scontrol show job > >> > JobId=25 JobName=sleep > >> > UserId=root(0) GroupId=root(0) MCS_label=N/A > >> > Priority=4294901754 Nice=0 Account=(null) QOS=(null) > >> > JobState=COMPLETED Reason=None Dependency=(null) > >> > Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 > >> > RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A > >> > SubmitTime=2017-02-14T15:38:41 EligibleTime=2017-02-14T15:38:41 > >> > StartTime=2017-02-14T15:38:41 EndTime=2017-02-14T15:38:46 > >> > Deadline=N/A > >> > PreemptTime=None SuspendTime=None SecsPreSuspend=0 > >> > Partition=short AllocNode:Sid=dhcpvm4-174:5130 > >> > ReqNodeList=(null) ExcNodeList=(null) > >> > NodeList=dhcpvm4-191 > >> > BatchHost=dhcpvm4-191 > >> > NumNodes=1 NumCPUs=1 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:* > >> > TRES=cpu=1,mem=600M,node=1 > >> > Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* > >> > MinCPUsNode=1 MinMemoryNode=600M MinTmpDiskNode=0 > >> > Features=(null) DelayBoot=00:00:00 > >> > Gres=(null) Reservation=(null) > >> > OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) > >> > Command=sleep > >> > WorkDir=/root > >> > Power= > >> > > >> > It doesn't help me a lot with my problem but ... > >> > > >> > Best regards, > >> > > >> > Julien > >> > > >> > 2017-02-02 8:53 GMT+01:00 Julien Collas <jul.col...@gmail.com>: > >> >> > >> >> Hi, > >> >> > >> >> It seems that my MaxMemPerCpu is not working as I would have expected > >> >> (increase cpu if mem or mem-per-cpu exceed that limit) > >> >> > >> >> Here is my partition definition > >> >> > >> >> $ scontrol show part short > >> >> PartitionName=short > >> >> AllowGroups=ALL DenyAccounts=data AllowQos=ALL > >> >> AllocNodes=ALL Default=YES QoS=N/A > >> >> DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 > >> >> Hidden=NO > >> >> MaxNodes=UNLIMITED MaxTime=00:30:00 MinNodes=1 LLN=NO > >> >> MaxCPUsPerNode=UNLIMITED > >> >> Nodes=srv0029[73-80,87-95,98-99] > >> >> PriorityJobFactor=1000 PriorityTier=1000 RootOnly=NO ReqResv=NO > >> >> OverSubscribe=NO PreemptMode=OFF > >> >> State=UP TotalCPUs=560 TotalNodes=28 SelectTypeParameters=NONE > >> >> DefMemPerCPU=19000 MaxMemPerCPU=19000 > >> >> > >> >> > >> >> $ srun --partition=short --mem=40000 sleep 10 > >> >> $ srun --partition=short --mem-per-cpu=40000 sleep 10 > >> >> $ sacct > >> >> JobID User Account JobName Priority NTasks > >> >> AllocCPUS ReqCPUS MaxRSS MaxVMSize ReqMem State > >> >> ------------ --------- ---------- ---------- ---------- -------- > >> >> ---------- -------- ---------- ---------- ---------- ---------- > >> >> 19522383 jcollas admin sleep 994 1 > >> >> 1 1 92K 203980K 40000Mn COMPLETED > >> >> 19522384 jcollas admin sleep 994 1 > >> >> 1 1 92K 203980K 40000Mc COMPLETED > >> >> > >> >> For theses 2 jobs, I would have expected AllocCPUS to 3. > >> >> > >> >> $ scontrol show conf > >> >> ... > >> >> DefMemPerNode = UNLIMITED > >> >> MaxMemPerNode = UNLIMITED > >> >> MemLimitEnforce = Yes > >> >> SelectTypeParameters = CR_CPU_MEMORY > >> >> ... > >> >> AccountingStorageBackupHost = (null) > >> >> AccountingStorageEnforce = associations,limits > >> >> AccountingStorageHost = stor089 > >> >> AccountingStorageLoc = N/A > >> >> AccountingStoragePort = 6819 > >> >> AccountingStorageTRES = cpu,mem,energy,node > >> >> AccountingStorageType = accounting_storage/slurmdbd > >> >> AccountingStorageUser = N/A > >> >> AccountingStoreJobComment = Yes > >> >> AcctGatherEnergyType = acct_gather_energy/none > >> >> AcctGatherFilesystemType = acct_gather_filesystem/none > >> >> AcctGatherInfinibandType = acct_gather_infiniband/none > >> >> AcctGatherNodeFreq = 0 sec > >> >> AcctGatherProfileType = acct_gather_profile/none > >> >> JobAcctGatherFrequency = 10 > >> >> JobAcctGatherType = jobacct_gather/cgroup > >> >> JobAcctGatherParams = (null) > >> >> ... > >> >> > >> >> We are currently running with version 16.05.6 > >> >> > >> >> Is there something I am missing ? > >> >> > >> >> > >> >> Regards, > >> >> > >> >> Julien > >> > > >> > > > > > >