You are trying to specifically run on node cn110, so you may want to check that out with sinfo

A quick "sinfo -R" can list any down machines and the reasons.

Brian Andrus

On 11/10/2019 11:23 PM, Sukman wrote:
Hi Brian,

I see. Thank you for your suggestion.
I definitely will try it.

Anyway, I am now suffering a new problem.
The job cannot start because of "Resources" problem.

Would anyone help on this issue?


I previously enabled these options in Slurm.conf

SelectType=select/cons_res
SelectTypeParameters=CR_Core

However, since the job didn't work well by enabling those options, now those 
options are disabled again.
I then, restarted Slurm in both head node and compute node.


But, now when I run a job containing this script, the job is pending.

the script
#!/bin/bash
#SBATCH --job-name=hostname
##sbatch --time=00:50
##sbatch --mem=10M
##SBATCH --nodes=1
##SBATCH --ntasks=1
##SBATCH --ntasks-per-node=1
##SBATCH --cpus-per-task=1
##SBATCH --nodelist=cn110

srun hostname


scontrol show job 79
JobId=79 JobName=hostname
    UserId=sukman(1000) GroupId=nobody(1000) MCS_label=N/A
    Priority=4294901753 Nice=0 Account=user QOS=normal_compute
    JobState=PENDING Reason=Resources Dependency=(null)
    Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
    RunTime=00:00:00 TimeLimit=00:01:00 TimeMin=N/A
    SubmitTime=2019-11-11T14:10:41 EligibleTime=2019-11-11T14:10:41
    StartTime=Unknown EndTime=Unknown Deadline=N/A
    PreemptTime=None SuspendTime=None SecsPreSuspend=0
    LastSchedEval=2019-11-11T14:18:41
    Partition=defq AllocNode:Sid=itbhn02:11211
    ReqNodeList=(null) ExcNodeList=(null)
    NodeList=(null)
    NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
    TRES=cpu=1,node=1
    Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
    MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
    Features=(null) DelayBoot=00:00:00
    Gres=(null) Reservation=(null)
    OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
    Command=/home/sukman/script/test_hostname.sh
    WorkDir=/home/sukman/script
    StdErr=/home/sukman/script/slurm-79.out
    StdIn=/dev/null
    StdOut=/home/sukman/script/slurm-79.out
    Power=



------------------------------------------

Suksmandhira H
ITB Indonesia



----- Original Message -----
From: "Brian W. Johanson" <bjoha...@psc.edu>
To: "Slurm User Community List" <slurm-users@lists.schedmd.com>
Sent: Friday, November 8, 2019 8:58:40 PM
Subject: Re: [slurm-users] Limiting the number of CPU

Suksmandhira,
That qos specifies a walltime, cpu, and memory limit.  From the job script, it 
appears you are within the cpu limit.  But, the job script does not specify 
walltime nor memory and your squeue output is not showing those values (or cpu) 
for the job.
'scontrol show job=JOBID' will show it all values.  Added flags=DenyOnLimit to 
the qos will reject the job when it is over the limit of a QOS, hopefully so 
there are not jobs that will never run sitting in queue.

-b


Reply via email to