Indeed, this block is not present in the man page of 14.03 but it's working
like if it were the case :-)

In version 16.05.6, it's in the man page but it's not working :-(

Julien

2017-02-14 16:48 GMT+01:00 E V <eliven...@gmail.com>:

>
> Interesting, that bit isn't in the man page for 14.03. I'll be
> deploying 15.08 soon, so that's nice to know.
>
> On Tue, Feb 14, 2017 at 10:28 AM, Julien Collas <jul.col...@gmail.com>
> wrote:
> > Hello,
> >
> > From the man page:
> >
> >>        MaxMemPerCPU
> >>         ...
> >>               NOTE: If a job specifies a memory per CPU limit that
> exceeds
> >> this system limit, that job’s count of CPUs per task will automatically
> be
> >> increased. This may result in the job failing due to CPU count limits.
> >
> >
> > Here is hat I get with version 15.08
> >
> > # srun --version
> > slurm 15.08.13
> > # srun --mem 600 sleep 5 && scontrol show job
> > JobId=15 JobName=sleep
> >    UserId=root(0) GroupId=root(0)
> >    Priority=4294901757 Nice=0 Account=(null) QOS=(null)
> >    JobState=COMPLETED Reason=None Dependency=(null)
> >    Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
> >    RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A
> >    SubmitTime=2017-02-14T14:39:54 EligibleTime=2017-02-14T14:39:54
> >    StartTime=2017-02-14T14:39:54 EndTime=2017-02-14T14:39:59
> >    PreemptTime=None SuspendTime=None SecsPreSuspend=0
> >    Partition=short AllocNode:Sid=dhcpvm4-174:5130
> >    ReqNodeList=(null) ExcNodeList=(null)
> >    NodeList=dhcpvm4-191
> >    BatchHost=dhcpvm4-191
> >    NumNodes=1 NumCPUs=3 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
> >    TRES=cpu=3,mem=600,node=1
> >    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
> >    MinCPUsNode=3 MinMemoryNode=600M MinTmpDiskNode=0
> >    Features=(null) Gres=(null) Reservation=(null)
> >    Shared=OK Contiguous=0 Licenses=(null) Network=(null)
> >    Command=sleep
> >    WorkDir=/root
> >    Power= SICP=0
> >
> >
> > Julien
> >
> > 2017-02-14 16:14 GMT+01:00 E V <eliven...@gmail.com>:
> >>
> >>
> >> You sure it worked as you expected? I always think of CPUs & RAM is
> >> independent things that need to be manually requested independently.
> >>
> >> On Tue, Feb 14, 2017 at 10:07 AM, Julien Collas <jul.col...@gmail.com>
> >> wrote:
> >> > Hello,
> >> >
> >> > I made some tests on a simple environment and it seems that this
> >> > functionality is working fine until 15.08.13 (included).
> >> > With versions 16.05.6, 16.05.9, and 17.02.0-0rc1 I'm not able to see
> >> > what I
> >> > would expect to see.
> >> >
> >> > # scontrol show part
> >> > PartitionName=short
> >> >    AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
> >> >    AllocNodes=ALL Default=YES QoS=N/A
> >> >    DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
> >> > Hidden=NO
> >> >    MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO
> >> > MaxCPUsPerNode=UNLIMITED
> >> >    Nodes=dhcpvm4-191
> >> >    PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO
> >> > OverSubscribe=NO
> >> >    OverTimeLimit=NONE PreemptMode=OFF
> >> >    State=UP TotalCPUs=8 TotalNodes=1 SelectTypeParameters=NONE
> >> >    DefMemPerCPU=200 MaxMemPerCPU=200
> >> >
> >> > # srun --mem 600 sleep 5 && scontrol show job
> >> > JobId=25 JobName=sleep
> >> >    UserId=root(0) GroupId=root(0) MCS_label=N/A
> >> >    Priority=4294901754 Nice=0 Account=(null) QOS=(null)
> >> >    JobState=COMPLETED Reason=None Dependency=(null)
> >> >    Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
> >> >    RunTime=00:00:05 TimeLimit=UNLIMITED TimeMin=N/A
> >> >    SubmitTime=2017-02-14T15:38:41 EligibleTime=2017-02-14T15:38:41
> >> >    StartTime=2017-02-14T15:38:41 EndTime=2017-02-14T15:38:46
> >> > Deadline=N/A
> >> >    PreemptTime=None SuspendTime=None SecsPreSuspend=0
> >> >    Partition=short AllocNode:Sid=dhcpvm4-174:5130
> >> >    ReqNodeList=(null) ExcNodeList=(null)
> >> >    NodeList=dhcpvm4-191
> >> >    BatchHost=dhcpvm4-191
> >> >    NumNodes=1 NumCPUs=1 NumTasks=0 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
> >> >    TRES=cpu=1,mem=600M,node=1
> >> >    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
> >> >    MinCPUsNode=1 MinMemoryNode=600M MinTmpDiskNode=0
> >> >    Features=(null) DelayBoot=00:00:00
> >> >    Gres=(null) Reservation=(null)
> >> >    OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
> >> >    Command=sleep
> >> >    WorkDir=/root
> >> >    Power=
> >> >
> >> > It doesn't help me a lot with my problem but ...
> >> >
> >> > Best regards,
> >> >
> >> > Julien
> >> >
> >> > 2017-02-02 8:53 GMT+01:00 Julien Collas <jul.col...@gmail.com>:
> >> >>
> >> >> Hi,
> >> >>
> >> >> It seems that my MaxMemPerCpu is not working as I would have expected
> >> >> (increase cpu if mem or mem-per-cpu exceed that limit)
> >> >>
> >> >> Here is my partition definition
> >> >>
> >> >> $ scontrol show part short
> >> >> PartitionName=short
> >> >>    AllowGroups=ALL DenyAccounts=data AllowQos=ALL
> >> >>    AllocNodes=ALL Default=YES QoS=N/A
> >> >>    DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0
> >> >> Hidden=NO
> >> >>    MaxNodes=UNLIMITED MaxTime=00:30:00 MinNodes=1 LLN=NO
> >> >> MaxCPUsPerNode=UNLIMITED
> >> >>    Nodes=srv0029[73-80,87-95,98-99]
> >> >>    PriorityJobFactor=1000 PriorityTier=1000 RootOnly=NO ReqResv=NO
> >> >> OverSubscribe=NO PreemptMode=OFF
> >> >>    State=UP TotalCPUs=560 TotalNodes=28 SelectTypeParameters=NONE
> >> >>    DefMemPerCPU=19000 MaxMemPerCPU=19000
> >> >>
> >> >>
> >> >> $ srun --partition=short --mem=40000 sleep 10
> >> >> $ srun --partition=short --mem-per-cpu=40000 sleep 10
> >> >> $ sacct
> >> >>        JobID      User    Account    JobName   Priority   NTasks
> >> >> AllocCPUS  ReqCPUS     MaxRSS  MaxVMSize     ReqMem      State
> >> >> ------------ --------- ---------- ---------- ---------- --------
> >> >> ---------- -------- ---------- ---------- ---------- ----------
> >> >> 19522383       jcollas      admin      sleep        994        1
> >> >> 1        1        92K    203980K    40000Mn  COMPLETED
> >> >> 19522384       jcollas      admin      sleep        994        1
> >> >> 1        1        92K    203980K    40000Mc  COMPLETED
> >> >>
> >> >> For theses 2 jobs, I would have expected AllocCPUS to 3.
> >> >>
> >> >> $ scontrol show conf
> >> >> ...
> >> >> DefMemPerNode               = UNLIMITED
> >> >> MaxMemPerNode               = UNLIMITED
> >> >> MemLimitEnforce             = Yes
> >> >> SelectTypeParameters        = CR_CPU_MEMORY
> >> >> ...
> >> >> AccountingStorageBackupHost = (null)
> >> >> AccountingStorageEnforce    = associations,limits
> >> >> AccountingStorageHost       = stor089
> >> >> AccountingStorageLoc        = N/A
> >> >> AccountingStoragePort       = 6819
> >> >> AccountingStorageTRES       = cpu,mem,energy,node
> >> >> AccountingStorageType       = accounting_storage/slurmdbd
> >> >> AccountingStorageUser       = N/A
> >> >> AccountingStoreJobComment   = Yes
> >> >> AcctGatherEnergyType        = acct_gather_energy/none
> >> >> AcctGatherFilesystemType    = acct_gather_filesystem/none
> >> >> AcctGatherInfinibandType    = acct_gather_infiniband/none
> >> >> AcctGatherNodeFreq          = 0 sec
> >> >> AcctGatherProfileType       = acct_gather_profile/none
> >> >> JobAcctGatherFrequency      = 10
> >> >> JobAcctGatherType           = jobacct_gather/cgroup
> >> >> JobAcctGatherParams         = (null)
> >> >> ...
> >> >>
> >> >> We are currently running with version 16.05.6
> >> >>
> >> >> Is there something I am missing ?
> >> >>
> >> >>
> >> >> Regards,
> >> >>
> >> >> Julien
> >> >
> >> >
> >
> >
>

Reply via email to